scribe-org / Scribe-Data

Wikidata, Wiktionary and Wikipedia language data extraction
GNU General Public License v3.0
30 stars 69 forks source link

fixes #72: A script teanslates English to other languages #81

Closed Linfye closed 8 months ago

Linfye commented 8 months ago

Contributor checklist


Description

The script can run on Google Colab and that's where I code on. Because running the script will take too much time, the translated_words are only a part of all the words. But it shows the feasibility of the program.

Looking forward to your code review

Related issue

github-actions[bot] commented 8 months ago

Thank you for the pull request!

The Scribe team will do our best to address your contribution as soon as we can. The following is a checklist for maintainers to make sure this process goes as well as possible. Feel free to address the points below yourself in further commits if you realize that actions are needed :)

If you're not already a member of our public Matrix community, please consider joining! We'd suggest using Element as your Matrix client, and definitely join the General and Data rooms once you're in. It'd be great to have you!

Maintainer checklist

andrewtavis commented 8 months ago

Thanks for this, @Linfye! I'll get back to you with a review soon! Once this is merged you'd be welcome to work on the other languages :)

Linfye commented 8 months ago

I fixed the problems you mentioned except the last one. @wkyoshida I wonder now should I work on it or others works on it now. Looking forward to your reply. cc @andrewtavis

shashank-iitbhu commented 8 months ago

I fixed the problems you mentioned except the last one. @wkyoshida I wonder now should I work on it or others works on it now. Looking forward to your reply. cc @andrewtavis

Can you refer to #88 and #89 ? I have implemented a different approach i.e batch processing of words for translation. This way it is relatively faster.

We can decide on a single approach, if the requirement is to iterate over each word rather than batch processing then we can go ahead with this PR. cc @andrewtavis @wkyoshida

Linfye commented 8 months ago

I fixed the problems you mentioned except the last one. @wkyoshida I wonder now should I work on it or others works on it now. Looking forward to your reply. cc @andrewtavis

Can you refer to #88 and #89 ? I have implemented a different approach i.e batch processing of words for translation. This way it is relatively faster.

We can decide on a single approach, if the requirement is to iterate over each word rather than batch processing then we can go ahead with this PR. cc @andrewtavis @wkyoshida

I check the code and wonder if continue downloading from last progress cause the words are too much. If it works better, we can adopt yours.

andrewtavis commented 8 months ago

Sorry for the delay on all of this, all :) I was on vacation and then sick right after... Checked and sent along some formatting in 2460584. I'll bring this in shortly as well as the work that's @shashank-iitbhu mentioned. I'll give it all a test to see how things are working. I'd say batch processing and having the process in the utils makes sense to me 😊

andrewtavis commented 8 months ago

Ah, and a quick note on this: let's be sure to remove as much whitespace from JSON outputs as possible in the future as that does bring the file size down slightly 😊