Closed alk0 closed 7 years ago
Uhm... I don't know awk at all, but from my understanding of the code, the problem may lie in relying on @ind_num_asc
- the returned ast
doesn't get sorted properly, because its indices/keys are not actually numbers, and therefore we get the line number 10 right after the line number 1 and so on. Unfortunately, I'm afraid, I'm not skilled enough to fix it.
ast = { "0,0,0,0" "\"En el año 1878 tomé mi grado de doctor en medicina de la Universidad de Londres, y procedí a Netley para pasar por el curso prescrito para cirujanos en el ejército. \"" "0,0,0,1" "\"In the year 1878 I took my degree of Doctor of Medicine of the University of London, and proceeded to Netley to go through the course prescribed for surgeons in the army.\"" "0,0,0,2" "null" "0,0,0,3" "null" "0,0,0,4" "3" "0,0,1,0" "\"Habiendo completado mis estudios allí, estaba debidamente unido a los Quinto Northumberland Fusiliers como Cirujano Auxiliar. \"" "0,0,1,1" "\"Having completed my studies there, I was duly attached to the Fifth Northumberland Fusiliers as Assistant Surgeon.\"" "0,0,1,2" "null" "0,0,1,3" "null" "0,0,1,4" "3" "0,0,10,0" "\"Aquí me reuní, y ya había mejorado hasta poder pasear por los barrios, e incluso a disfrutar un poco de la veranda, cuando fui golpeado por la fiebre entérica, esa maldición de nuestras posesiones indias.\"" "0,0,10,1" "\"Here I rallied, and had already improved so far as to be able to walk about the wards, and even to bask a little upon the verandah, when I was struck down by enteric fever, that curse of our Indian possessions.\"" "0,0,10,2" "null" "0,0,10,3" "null" "0,0,10,4" "3" "0,0,2,0" "\"El regimiento estaba estacionado en la India en ese momento, y antes de que pudiera unirme, la segunda guerra afgana había estallado. \"" "0,0,2,1" "\"The regiment was stationed in India at the time, and before I could join it, the second Afghan war had broken out.\"" "0,0,2,2" "null" "0,0,2,3" "null" "0,0,2,4" "3" [...]
Thanks for the catch, should be fixed now.
Thank you! The problem is gone, everything seems to be OK now.
If the source text paragraphs are long (something about >1500 chars?), the order of the output text sentences (source and then its translation) gets messed up. Seems to happen with any target language, with 'google' but not 'yandex' as the translation engine. (The content of se1600.txt in the example below is exactly the first part of the "expected output", 1663 symbols.)
$ trans en:es --show-translation-phonetics n --show-languages n --show-alternatives n --show-prompt-message n --no-ansi -i se1600.txt
expected output:
actual output: