moses-smt / salm

SALM: Suffix Array and its Applications in Empirical Language Processing by Joy
GNU General Public License v2.0
11 stars 5 forks source link

Sentence 4556 has more than 256 words. Can not handle such long sentence. Please cut it short first! #4

Open ajesujoba opened 3 years ago

ajesujoba commented 3 years ago

I want to create a suffix array index of the source and target sides of my training bitext. But it appears I cannot process sentences with more than 256 words. Is there a way I can increase the maximum number of words per sentence to 512 or 1024?

hieuhoang commented 3 years ago

no idea I'm afraid. No one has worked on the code for years. If you fix it, please create a pull request