scribe-org / Scribe-Data

Wikidata, Wiktionary and Wikipedia language data extraction
GNU General Public License v3.0
30 stars 69 forks source link

Refactor ISO code usage using Python langcodes #55

Closed andrewtavis closed 9 months ago

andrewtavis commented 1 year ago

Terms

Description

Something that I realized while doing a project for work is that there is a Python package langcodes that can handle a lot of the ISO-2 code work we're doing with Scribe-Data. Adding this to the dependencies would also make adding new languages to the data process easier in the future :)

Contribution

Happy to support someone who has interest in working on this!

andrewtavis commented 1 year ago

Ping @m-charlton: definitely not the most difficult of issues, but this would be a solid one to pick up for now :) I'll get to the merges soon!

andrewtavis commented 1 year ago

This would basically allow us to remove this data from the JSON once that's merged :)

shashank-iitbhu commented 10 months ago

@andrewtavis I have setup the development environment locally. I can see two functions get_language_iso and get_language_from_iso in src/scribe_data/utlis.py . You can assign me this issue, so that I can go ahead and create a draft PR, Link to the langcodes library.

andrewtavis commented 10 months ago

Thanks @shashank-iitbhu! Really appreciate all the details! Let me know if there's anything I can to do help!