Wrong order of sentences in both source and translation (using google) #174

alk0 commented 7 years ago

If the source text paragraphs are long (something about >1500 chars?), the order of the output text sentences (source and then its translation) gets messed up. Seems to happen with any target language, with 'google' but not 'yandex' as the translation engine. (The content of se1600.txt in the example below is exactly the first part of the "expected output", 1663 symbols.)

$ trans en:es --show-translation-phonetics n --show-languages n --show-alternatives n --show-prompt-message n --no-ansi -i se1600.txt

expected output:

In the year 1878 I took my degree of Doctor of Medicine of the University of London, and proceeded to Netley to go through the course prescribed for surgeons in the army. Having completed my studies there, I was duly attached to the Fifth Northumberland Fusiliers as Assistant Surgeon. The regiment was stationed in India at the time, and before I could join it, the second Afghan war had broken out. On landing at Bombay, I learned that my corps had advanced through the passes, and was already deep in the enemy's country. I followed, however, with many other officers who were in the same situation as myself, and succeeded in reaching Candahar in safety, where I found my regiment, and at once entered upon my new duties. The campaign brought honours and promotion to many, but for me it had nothing but misfortune and disaster. I was removed from my brigade and attached to the Berkshires, with whom I served at the fatal battle of Maiwand. There I was struck on the shoulder by a Jezail bullet, which shattered the bone and grazed the subclavian artery. I should have fallen into the hands of the murderous Ghazis had it not been for the devotion and courage shown by Murray, my orderly, who threw me across a pack-horse, and succeeded in bringing me safely to the British lines. Worn with pain, and weak from the prolonged hardships which I had undergone, I was removed, with a great train of wounded sufferers, to the base hospital at Peshawar. Here I rallied, and had already improved so far as to be able to walk about the wards, and even to bask a little upon the verandah, when I was struck down by enteric fever, that curse of our Indian possessions.

En el año 1878 tomé mi grado de doctor en medicina de la Universidad de Londres, y procedí a Netley para pasar por el curso prescrito para cirujanos en el ejército. Habiendo completado mis estudios allí, estaba debidamente unido a los Quinto Northumberland Fusiliers como Cirujano Auxiliar. El regimiento estaba estacionado en la India en ese momento, y antes de que pudiera unirme, la segunda guerra afgana había estallado. Al aterrizar en Bombay, me enteré de que mi cuerpo había avanzado a través de los pasos, y ya estaba en lo profundo del país enemigo. Seguí, sin embargo, con muchos otros oficiales que estaban en la misma situación que yo, y logré alcanzar a Candahar en seguridad, donde encontré mi regimiento, y de inmediato entré en mis nuevos deberes. La campaña trajo honores y promoción a muchos, pero para mí no tuvo más que desgracia y desastre. Me quitaron de mi brigada y me unieron a los Berkshires, con quienes serví en la batalla fatal de Maiwand. Allí fui golpeado en el hombro por una bala de Jezail, que destrozó el hueso y rozó la arteria subclavia. Debería haber caído en manos de los Ghazis asesinos si no hubiera sido por la devoción y coraje demostrados por Murray, mi ordenado, que me arrojó a través de un caballo de carga y logró llevarme a salvo a las líneas británicas. Gastado de dolor y débil por las prolongadas dificultades que había sufrido, fui trasladado al hospital de Peshawar con un gran contingente de heridos. Aquí me reuní, y ya había mejorado hasta poder pasear por los barrios, e incluso a disfrutar un poco de la veranda, cuando fui golpeado por la fiebre entérica, esa maldición de nuestras posesiones indias.

actual output:

In the year 1878 I took my degree of Doctor of Medicine of the University of London, and proceeded to Netley to go through the course prescribed for surgeons in the army. Having completed my studies there, I was duly attached to the Fifth Northumberland Fusiliers as Assistant Surgeon. Here I rallied, and had already improved so far as to be able to walk about the wards, and even to bask a little upon the verandah, when I was struck down by enteric fever, that curse of our Indian possessions. The regiment was stationed in India at the time, and before I could join it, the second Afghan war had broken out. On landing at Bombay, I learned that my corps had advanced through the passes, and was already deep in the enemy's country. I followed, however, with many other officers who were in the same situation as myself, and succeeded in reaching Candahar in safety, where I found my regiment, and at once entered upon my new duties. The campaign brought honours and promotion to many, but for me it had nothing but misfortune and disaster. I was removed from my brigade and attached to the Berkshires, with whom I served at the fatal battle of Maiwand. There I was struck on the shoulder by a Jezail bullet, which shattered the bone and grazed the subclavian artery. I should have fallen into the hands of the murderous Ghazis had it not been for the devotion and courage shown by Murray, my orderly, who threw me across a pack-horse, and succeeded in bringing me safely to the British lines. Worn with pain, and weak from the prolonged hardships which I had undergone, I was removed, with a great train of wounded sufferers, to the base hospital at Peshawar.

En el año 1878 tomé mi grado de doctor en medicina de la Universidad de Londres, y procedí a Netley para pasar por el curso prescrito para cirujanos en el ejército. Habiendo completado mis estudios allí, estaba debidamente unido a los Quinto Northumberland Fusiliers como Cirujano Auxiliar. Aquí me reuní, y ya había mejorado hasta poder pasear por los barrios, e incluso a disfrutar un poco de la veranda, cuando fui golpeado por la fiebre entérica, esa maldición de nuestras posesiones indias. El regimiento estaba estacionado en la India en ese momento, y antes de que pudiera unirme, la segunda guerra afgana había estallado. Al aterrizar en Bombay, me enteré de que mi cuerpo había avanzado a través de los pasos, y ya estaba en lo profundo del país enemigo. Seguí, sin embargo, con muchos otros oficiales que estaban en la misma situación que yo, y logré alcanzar a Candahar en seguridad, donde encontré mi regimiento, y de inmediato entré en mis nuevos deberes. La campaña trajo honores y promoción a muchos, pero para mí no tuvo más que desgracia y desastre. Me quitaron de mi brigada y me unieron a los Berkshires, con quienes serví en la batalla fatal de Maiwand. Allí fui golpeado en el hombro por una bala de Jezail, que destrozó el hueso y rozó la arteria subclavia. Debería haber caído en manos de los Ghazis asesinos si no hubiera sido por la devoción y coraje demostrados por Murray, mi ordenado, que me arrojó a través de un caballo de carga y logró llevarme a salvo a las líneas británicas. Gastado de dolor y débil por las prolongadas dificultades que había sufrido, fui trasladado al hospital de Peshawar con un gran contingente de heridos.

$ trans -V
Translate Shell

platform              Linux
gawk (GNU Awk)        4.1.3
fribidi (GNU FriBidi) [NOT INSTALLED]
audio player          mplayer
terminal pager        less
terminal type         xterm-256color
user locale           en_US.UTF-8 (English)
home language         en
source language       auto
target language       en
translation engine    google
proxy                 [NONE]
user-agent            Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/602.1 (KHTML, like Gecko) Version/8.0 Safari/602.1 Epiphany/3.18.2
theme                 default
init file             [NONE]

alk0 commented 7 years ago

Uhm... I don't know awk at all, but from my understanding of the code, the problem may lie in relying on @ind_num_asc - the returned ast doesn't get sorted properly, because its indices/keys are not actually numbers, and therefore we get the line number 10 right after the line number 1 and so on. Unfortunately, I'm afraid, I'm not skilled enough to fix it.

ast = { "0,0,0,0" "\"En el año 1878 tomé mi grado de doctor en medicina de la Universidad de Londres, y procedí a Netley para pasar por el curso prescrito para cirujanos en el ejército. \"" "0,0,0,1" "\"In the year 1878 I took my degree of Doctor of Medicine of the University of London, and proceeded to Netley to go through the course prescribed for surgeons in the army.\"" "0,0,0,2" "null" "0,0,0,3" "null" "0,0,0,4" "3" "0,0,1,0" "\"Habiendo completado mis estudios allí, estaba debidamente unido a los Quinto Northumberland Fusiliers como Cirujano Auxiliar. \"" "0,0,1,1" "\"Having completed my studies there, I was duly attached to the Fifth Northumberland Fusiliers as Assistant Surgeon.\"" "0,0,1,2" "null" "0,0,1,3" "null" "0,0,1,4" "3" "0,0,10,0" "\"Aquí me reuní, y ya había mejorado hasta poder pasear por los barrios, e incluso a disfrutar un poco de la veranda, cuando fui golpeado por la fiebre entérica, esa maldición de nuestras posesiones indias.\"" "0,0,10,1" "\"Here I rallied, and had already improved so far as to be able to walk about the wards, and even to bask a little upon the verandah, when I was struck down by enteric fever, that curse of our Indian possessions.\"" "0,0,10,2" "null" "0,0,10,3" "null" "0,0,10,4" "3" "0,0,2,0" "\"El regimiento estaba estacionado en la India en ese momento, y antes de que pudiera unirme, la segunda guerra afgana había estallado. \"" "0,0,2,1" "\"The regiment was stationed in India at the time, and before I could join it, the second Afghan war had broken out.\"" "0,0,2,2" "null" "0,0,2,3" "null" "0,0,2,4" "3" [...]

soimort commented 7 years ago

Thanks for the catch, should be fixed now.

alk0 commented 7 years ago

Thank you! The problem is gone, everything seems to be OK now.