scribe-org / Scribe-Data

Wikidata, Wiktionary and Wikipedia language data extraction
GNU General Public License v3.0
30 stars 69 forks source link

Remove current Scribe-Data translations process #309

Closed KesharwaniArpita closed 1 month ago

KesharwaniArpita commented 1 month ago

Contributor checklist


Description

This PR goes through the project and remove the current machine learning based translation process from Scribe-Data. Removed the following dependencies:

  1. torch
  2. transformers

from the src/scribe_data/language_data_extraction/translate_all.py

All references to these dependencies should be removed. Removed all translations_words.py files from English, French, German, Italian, Russian, Portugese, Swedish and Spanish folders in src/scribe_data/language_data_extraction .

Related issue

github-actions[bot] commented 1 month ago

Thank you for the pull request!

The Scribe team will do our best to address your contribution as soon as we can. The following is a checklist for maintainers to make sure this process goes as well as possible. Feel free to address the points below yourself in further commits if you realize that actions are needed :)

If you're not already a member of our public Matrix community, please consider joining! We'd suggest using Element as your Matrix client, and definitely join the General and Data rooms once you're in. Also consider joining our bi-weekly Saturday dev syncs. It'd be great to have you!

Maintainer checklist

KesharwaniArpita commented 1 month ago

Hi @andrewtavis , Could please check this out and guide me through this? I have removed the required functions and the files. Do we have to remove the entire translate folder from the mentioned language folders instead? Is it okay to clean up the code of the names and mention of these dependencies(like in requirement.txt and docs/conf.py) after the commits are approved?