obss / jury

Comprehensive NLP Evaluation System
MIT License
185 stars 20 forks source link

Meteor for multiple languages #135

Open salvadora opened 8 months ago

salvadora commented 8 months ago

Meteor for multiple languages It would be nice if the implementation of METEOR would support multiple languages (there is a library in JAVA , but I did not found any implementation in python)

Meta information

devrimcavusoglu commented 8 months ago

Hi @salvadora, thank you for the suggestion. This seems interesting, but it also seems kinda daunting. Due to the backlog on other works I probably cannot work on that right away.

Also, nice to keep in mind that, when I check the METEOR-E repo, it depends on many parts for DBnary, and; for example, there are Python scripts to extract synsets. It seems to me that ops required for DBnary is a lot more compared to meteor computations. Thus, what I think on METEOR-E integration to jury are:

(1) Integrate everything to jury, including any DBnary ops, and make the multilingual meteor-e computation native python only, not requiring any additional dependencies (e.g. 3rd party jars). This seems very daunting to me 😱. (2) Use, DBnary with java to extract mappings, and implement matching and score computation to jury only. This would set a hard dependency still to the java implementation 😐. (3) Use translation services/models to compute multilingual meteor 🤔.

Tbh, (3) is a very easy tasks compared to others, it may be conflicting with the nature of the evaluation task, I agree cuz METEOR was designed for MT tasks, and we are introducing backtranslation here. However, this should give some meaningful insights about how your model/system do.

For (1) and/or (2) a further development is required, first of all would you be interested in contributing ? and the second thing is that I would support/review the work actively you if you chose to contribute, but I'm afraid I would be unable to work on that right away in a small time window due to a tight schedule.

All in all, if you just want to evaluate your work using METEOR in a multilingual case, backtranslation (3) should be working theoretically, and this can be done easier, I would've started with sth like that to see the effects. If you do, please hover the updates on this issue about whether it worked or not, it would be beneficial to know such thing which could be also valid for other scores that rely only on English.