unhammer / apertium-en-hi

These are the linguistic data for the Apertium English-Hindi machine translator.
2 stars 4 forks source link

Hindi-English translation rules #4

Open NikantVohra opened 11 years ago

NikantVohra commented 11 years ago

I am working on translation rules for the story. I just want to know how should I proceed with the same so that I should not make the same mistakes as you might have made when working on Bengali-English.Also can I make use of the rules of Bengali-English for the same?

azmfaridee commented 11 years ago

@NikantVohra If you are referring to creation of transfer rules, I'm afraid some mistakes are inevitable, you'll often find that the current tagsets sometimes cannot infer the linguistic information you are looking for. These were some of the problems we faced.

Btw, sorry for being late, had some unexpected delays in return journey and then had to attend some busy work schedule at office.

ftyers commented 11 years ago

El dg 16 de 06 de 2013 a les 23:10 -0700, en/na Abu Zaher va escriure:

@NikantVohra If you are referring to creation of transfer rules, I'm afraid some mistakes are inevitable, you'll often find that the current tagsets sometimes cannot infer the linguistic information you are looking for. These were some of the problems we faced.

  * The english PoS tagger often gave confusing results, it was
    really a headache. I don't know whether there has been any
    improvement in the tagger by this time. Ask @ftyers for more
    details.

No, the English tagger is pretty awful still. But that doesn't matter so much as we're going to be doing Hindi->English

  * Regarding this issue, Constraint Grammar would come pretty
    handy I think. @ftyers already added these things into project
    dependency chains, so you should have them at your disposal.

Yes, we should be using CG to disambiguate the Hindi. There are already a few rules. These can be expanded on.

  * Some linguistic information was hard to re-create when doing
    Bengali to English, e.g. Bengali pronouns don't have gender,
    but English ones does. I don't think that would be in your
    case, as it's easy to infer gender from the verbs in Hindi.

For Hindi -> English the idea would be to set the gender as "to be determined" in the .t1x file and then use whatever information available in the .t2x file (verb form) to set it.

Fran