soilwise-he / metadata-augmentation

use nlp/llm on a resource to augment metadata statements about a resource
MIT License
0 stars 0 forks source link

Translate title/abstract from source language #4

Open pvgenuchten opened 3 weeks ago

pvgenuchten commented 3 weeks ago

If a metadata has a non-english language, add a english translation

This should be prior to indexation, so any english filters also include this record

Indicate on UI that translation is machine generated, with option to switch to original

Use EU translation service, google, deepl, ... or llm for translation

pvgenuchten commented 3 weeks ago

Runs as a micro service fetching strings from source database, stores its result in a translations table as hash, key, translation, language, if a hash is already in database returns instant translation

There is a risk (for smaller strings) that the string exists on two places, but with a different meaning, strings to be translated should have a minimal size?

robknapen commented 3 weeks ago

These are two Python packages that it can be based on. They already cover a lot.

https://deep-translator.readthedocs.io/en/latest/?badge=latest https://pypi.org/project/translators/

We would need to add an API, database, and external API key(s) (when using a payed service in the background).

robknapen commented 2 weeks ago
pvgenuchten commented 4 days ago

initial prototype of a service wrapper on top of EU service has been developed and deployed at https://api.soilwise-he.containers.wur.nl/translate/docs