scribe-org / Scribe-Data

Wikidata, Wiktionary and Wikipedia language data extraction
GNU General Public License v3.0
23 stars 25 forks source link

Fix MacOS build due to failing dependency installations #61

Closed andrewtavis closed 1 month ago

andrewtavis commented 8 months ago

Terms

Behavior

As of now the build for all versions of MacOS is failing. This was originally because of Tensorflow, which has now been removed as a dependency in favor of other packages that were also being used and had some similar functionality (parallel downloads). The most recent error is:

ERROR: Could not build wheels for marisa-trie, which is required to install pyproject.toml-based projects
Successfully built PyICU sentencepiece
Failed to build marisa-trie

Would be great if we could figure this out!

andrewtavis commented 8 months ago

CC @wkyoshida, in case you have some ideas on this :)

shashank-iitbhu commented 6 months ago

The workflow is failing because one of the dependancies language-data depends on marisa-trie<0.8.0 and >=0.7.7. And marisa-trie<0.8.0 and >=0.7.7 does not have support for Python 3.12 which is the error in the CI logs. As mentioned in the marisa-trie docs, marisa-trie 1.1.0 has added support for python 3.12 .

The solution is to remove the language-data dependancy from requirements.txt as I think it is not being used anywhere in the project.

I also tried pinning both language-data and marisa-trie to their latest versions but this won't work beacuse of this:

ERROR: Cannot install language-data==1.1 and marisa-trie>=1.1.0 because these package versions have conflicting dependencies.


    The user requested marisa-trie>=1.1.0
    language-data 1.1 depends on marisa-trie<0.8.0 and >=0.7.7

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts```
shashank-iitbhu commented 6 months ago

@andrewtavis @wkyoshida

andrewtavis commented 6 months ago

Thanks for looking into this, @shashank-iitbhu! 😊 I just checked and the following is in the pyproject.toml for langcodes:

[tool.poetry.dependencies]
python = ">= 3.6"
language-data = { version = "^1.1", optional = true }

[tool.poetry.dev-dependencies]
language-data = { version = "^1.1", optional = true }

[tool.poetry.extras]
data = ["language-data"]

Do you want to give try to remove language-data from the dependencies and hopefully it'll work, @shashank-iitbhu? We can also just not support Python 3.12 right now. It seems like there was a recent fork of language-data that's now maintained :)

andrewtavis commented 6 months ago

It looks like the new language-data will be just the lower bound of marisa-trie, but then it seems to be already in production. Let's give unpinning a try and focus on testing Python 3.9 as we are now.

andrewtavis commented 5 months ago

Thinking about this a bit more and what a hassle langcodes/language-data and marisa-trie have been since being added in, would it make sense to revert this and shift back to a dictionary that we're maintaining based on the languages we have the in extract_transform/languages directory? We just wanted this for language to ISO-2 conversion.

CC @wkyoshida

andrewtavis commented 5 months ago

@shashank-iitbhu, would you want to implement a reverse of #55 such that we have the codes saved for all the current languages in the languages directory? We'd be adding those in to language_meta_data.json :)

shashank-iitbhu commented 5 months ago

Sure @andrewtavis, will open a PR for this.

andrewtavis commented 5 months ago

Thanks, @shashank-iitbhu!

andrewtavis commented 5 months ago

Closed by #129. Thanks for this, @shashank-iitbhu!

andrewtavis commented 3 months ago

Reopening here and removing the mac build for now as this is once again problematic...

andrewtavis commented 1 month ago

Well would you look at them Mac builds! The combination of the following got this fixed:

I referenced the following Stack Overflow answer when I was setting up the project again from scratch and realized that it likely could help with the problems we were having here 😊

Closing until the next time the Mac builds break! 😅