Open emanjavacas opened 7 years ago
this is going to be tricky (getting a balanced dataset is really hard in AA), but I understand that you want all the data you get for training the LM.
Prof. Dr. Mike Kestemont | www.mike-kestemont.org | Twitter: @Mike_Kestemont | mike.kestemont@uantwerp.be | mike.kestemont@gmail.com | University of Antwerp | City Campus, Prinsstraat 13, room D. 118 I B-2000 Antwerp, Belgium | tel. +32 (0)3 265.42.54
Check out our documentary on Digital Humanities and Hildegard of Bingen: watch it in HD on Vimeo: https://vimeo.com/70881172
On Tue, May 9, 2017 at 4:18 PM, Enrique Manjavacas <notifications@github.com
wrote:
... given the high variance in the document length.
We should think of a less wasteful way of solving this than just cropping documents to a fixed max length.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jedgusse/project_lorenzo/issues/8, or mute the thread https://github.com/notifications/unsubscribe-auth/AELJL5zGOyenMCTod9qF46jEQrMTew-7ks5r4HXQgaJpZM4NVYx3 .
... given the high variance in the document length.
We should think of a less wasteful way of solving this than just cropping documents to a fixed max length.