scribe-org / Scribe-Data

Wikidata, Wiktionary and Wikipedia language data extraction
GNU General Public License v3.0
30 stars 69 forks source link

fixes #55 : Refactor ISO code usage using Python langcodes #60

Closed shashank-iitbhu closed 9 months ago

shashank-iitbhu commented 10 months ago

Contributor checklist


Description

Related issue

github-actions[bot] commented 10 months ago

Thank you for the pull request!

The Scribe team will do our best to address your contribution as soon as we can. The following is a checklist for maintainers to make sure this process goes as well as possible. Feel free to address the points below yourself in further commits if you realize that actions are needed :)

If you're not already a member of our public Matrix community, please consider joining! We'd suggest using Element as your Matrix client, and definitely join the General and Data rooms once you're in. It'd be great to have you!

Maintainer checklist

andrewtavis commented 10 months ago

Hey @shashank-iitbhu! Really happy to get this so quickly! Thanks so much :)

A quick note on this is that it looks like your GitHub email isn't set up properly. See the note in the maintainer checklist:

  • If there's a mismatch, the contributor needs to make sure that the email they use for GitHub matches what they have for git config user.email in their local Scribe-Data repo

Do you want to check that and maybe reopen this PR after? Big thing is that your account likely won't get credit for the commit as it is now.

shashank-iitbhu commented 10 months ago

Hey @shashank-iitbhu! Really happy to get this so quickly! Thanks so much :)

A quick note on this is that it looks like your GitHub email isn't set up properly. See the note in the maintainer checklist:

  • If there's a mismatch, the contributor needs to make sure that the email they use for GitHub matches what they have for git config user.email in their local Scribe-Data repo

Do you want to check that and maybe reopen this PR after? Big thing is that your account likely won't get credit for the commit as it is now.

I have configured my email with this PR. Let me know if this works.

andrewtavis commented 10 months ago

Looks great now @shashank-iitbhu! Thanks for the quick reaction :)

andrewtavis commented 10 months ago

@wkyoshida, do you have an idea on the tensorflow version error (just installed 2.11 for work earlier today). Before the version in the env file and requirements were different, but post fix in my commit it's still saying:

ERROR: Could not find a version that satisfies the requirement tensorflow>=2.11.0 (from versions: none)
ERROR: No matching distribution found for tensorflow>=2.11.0
wkyoshida commented 10 months ago

@wkyoshida, do you have an idea on the tensorflow version error

Still not sure yet what the issue is.. :thinking: Briefly looked at suggestions for resolving tensorflow versions, but things like using a high enough pip version or 64-bit don't seem like they'd be the case.. the search goes on

andrewtavis commented 10 months ago

Still all are failing... 🤔

shashank-iitbhu commented 10 months ago

Still all are failing... 🤔

Screenshot 2024-01-19 at 11 54 07 PM

Can't we just upgrade the tensorflow version?

andrewtavis commented 10 months ago

I'll give that a try, and if it doesn't work I'll try to remove it all together. We're not really using it for anything major anyway.

wkyoshida commented 10 months ago

Still all are failing... 🤔

The failures in the Ubuntu runners (logs) appear now to be due to failing tests actually. I believe some of the tests might need updating given the new changes.

Could try running the pytest command locally to run the tests and see their outcome to determine which need fixing.

shashank-iitbhu commented 9 months ago

Still all are failing... 🤔

The failures in the Ubuntu runners (logs) appear now to be due to failing tests actually. I believe some of the tests might need updating given the new changes.

Could try running the pytest command locally to run the tests and see their outcome to determine which need fixing.

The tests were failing because langcodes.find was returning a LookupError when provided with the string "gibberish" and langcodes.make was returning a custom string instead of a ValueError. With the latest commit, I have modified both the functions in src/scribe_data/utils.py to return ValueError. All the tests passed.

andrewtavis commented 9 months ago

FYI you two I decided to update the data download process for generating autosuggestions to remove Tensorflow as it looks like there will be problems from now on given M1 vs. not M1 macs. We're now at least not getting that error anymore, but now we're looking at:

ERROR: Could not build wheels for marisa-trie, which is required to install pyproject.toml-based projects
Successfully built PyICU sentencepiece
Failed to build marisa-trie

Looks like the last hurdle for a good Mac build though, so progress 😊