scribe-org / Scribe-Data

Wikidata, Wiktionary and Wikipedia language data extraction
GNU General Public License v3.0
23 stars 25 forks source link

Expand Scribe-Data paths to allow for Windows commands #125

Closed andrewtavis closed 1 month ago

andrewtavis commented 6 months ago

Terms

Description

Something that would be great to add to Scribe-Data is some more documentation for Windows commands beneath their Linux/Mac counterparts in the readme and contributing guide as well as converting some of the path functionality of the codebase over to allow for Windows paths to be accessed.

Contribution

Happy to support on this as needed! 😊

andrewtavis commented 6 months ago

@mhmohona, you'd said you'd be interested in this one? :)

mhmohona commented 6 months ago

Yes.

andrewtavis commented 6 months ago

Assigned, @mhmohona! Let me know if there's anything we can do to assist :)

Jk40git commented 6 months ago

Assigned, @mhmohona! Let me know if there's anything we can do to assist :)

Could assist too if you don't mind

andrewtavis commented 6 months ago

Sure thing, @Jk40git. @mhmohona, let either of us know it anything's needed :)

mhmohona commented 6 months ago

So the first problem I faced while installing Scribe-data was installing PyICU. It has different installation process for Windows - https://github.com/ilius/pyglossary/blob/master/doc/pyicu.md#installation-on-windows

Jk40git commented 6 months ago

So the first problem I faced while installing Scribe-data was installing PyICU. It has different installation process for Windows - https://github.com/ilius/pyglossary/blob/master/doc/pyicu.md#installation-on-windows

@mhmohona have you tried the process? I can't load PyICU-2.9-cp311-cp311-win_amd64.whl into the command prompt

Jk40git commented 6 months ago

So the first problem I faced while installing Scribe-data was installing PyICU. It has different installation process for Windows - https://github.com/ilius/pyglossary/blob/master/doc/pyicu.md#installation-on-windows

@mhmohona have you tried the process? I can't load PyICU-2.9-cp311-cp311-win_amd64.whl into the command prompt

Screenshot 2024-03-27 195043

Jk40git commented 6 months ago

So the first problem I faced while installing Scribe-data was installing PyICU. It has different installation process for Windows - https://github.com/ilius/pyglossary/blob/master/doc/pyicu.md#installation-on-windows

@mhmohona have you tried the process? I can't load PyICU-2.9-cp311-cp311-win_amd64.whl into the command prompt

We need to install a certain pkg-config file using chocolatey

andrewtavis commented 6 months ago

Big thing to note here is that the PyICU behavior is known to be very weird... This is documented in #33, and I'd say that the install for PyICU for windows could be included in that issue. For this one, let's focus on the baseline paths for updating the data :)

CC @wkyoshida

Jk40git commented 5 months ago

@mhmohona please where are you with this issue

mhmohona commented 5 months ago

Hey @Jk40git, I actually havent started working on this issue after my last comment. However, I can suggest you an alternative to run the scripts without needing to installing locally. You can use Google Colab. Here I have setup the repo and other things - link. You can use this notebook as reference.

Jk40git commented 5 months ago

All right @mhmohona. I was able to access the path locally though. But my problem was the marisa-trie build wheels. Thanks for the reply

andrewtavis commented 5 months ago

This'll be solved over the weekend odds are :) We're discussing how to change the dependencies in #61, with the likely change being that we'll remove the ones that are causing all these issues.

mhmohona commented 2 months ago

Now in windows I can install Scribe data by following commands here but when trying to run a script, getting error like this - image

This issue seems it will take some time. Thus I want to work on it later.

andrewtavis commented 2 months ago

Thanks for looking into this, @mhmohona! Let's discuss later when we'd like to prioritize this :)

Jk40git commented 2 months ago

Now in windows I can install Scribe data by following commands here but when trying to run a script, getting error like this - image

This issue seems it will take some time. Thus I want to work on it later.

@mhmohona I think scribe_data wasn't successfully installed can you do pip install scribe_data and see?

andrewtavis commented 2 months ago

Ah ya, good point @Jk40git. It's not picking up Scribe-Data in your environment 🤔 We can maybe check this with a pip install . before the next release and figure out where the path issues are?

@Jk40git, you could also check this and @mhmohona could review? As you two see fit :)

andrewtavis commented 2 months ago

Specifically what we're looking for is if a person on Windows does the following:

pip install .
scribe-data q -all

If the data starts being downloaded, then that's the first hint that things are working properly :)

Jk40git commented 2 months ago

Ah ya, good point @Jk40git. It's not picking up Scribe-Data in your environment 🤔 We can maybe check this with a pip install . before the next release and figure out where the path issues are?

@Jk40git, you could also check this and @mhmohona could review? As you two see fit :)

Okay sure I was able to access it before the recent changes. But I will have a look at it again

Jk40git commented 2 months ago

Specifically what we're looking for is if a person on Windows does the following:

pip install .
scribe-data q -all

If the data starts being downloaded, then that's the first hint that things are working properly :)

Okay do you mean something like this? Screenshot 2024-07-07 223404

andrewtavis commented 2 months ago

That's the top of the output where you copied both commands in one, right @Jk40git? Now that you have it installed, and please check that you've git pulled, can you show what the output of scribe-data q -all is?

Jk40git commented 2 months ago

That's the top of the output where you copied both commands in one, right @Jk40git? Now that you have it installed, and please check that you've git pulled, can you show what the output of scribe-data q -all is?

Yes. the second output bash: scribe-data: command not found

andrewtavis commented 2 months ago

Hmmmm, but that's weird as it should have been covered by git install .. Is your local Scribe-Data up to date with git pull?

Jk40git commented 2 months ago

git install .? Or pip install .?

Yes it's up to date

andrewtavis commented 2 months ago

Sorry, pip install .. But this after git pull from the main branch here :)

Jk40git commented 2 months ago

That's what I did It says command not found

andrewtavis commented 2 months ago

Ok, well that's not good 😅 Can you tell us if you have the cli directory in your local version, so src/scribe_data/cli? @mhmohona, you're also on Windows, right?

Jk40git commented 2 months ago

Ok, well that's not good 😅 Can you tell us if you have the cli directory in your local version, so src/scribe_data/cli? @mhmohona, you're also on Windows, right?

Yes please I have the cli directory

Screenshot 2024-07-09 160326

andrewtavis commented 2 months ago

Thank for the conversation here, @Jk40git! Just checking things with the questions, hope you understand :) This is definitely something need need to be looking into then, as it seems that the CLI isn't being installed for you. @mhmohona, it'd be great to get your feedback as I believe it's working on your end, but maybe I'm misremembering your operating system as Windows, or there could be something else going on 🤔

@Jk40git, specifically this line in setup.py is the one that is setting up the CLI on install. It's at least worked for @mhmohona and I 🤔 I'm wondering what some potential fixes could be?? The line in question is:

  "console_scripts": [
      "scribe-data=scribe_data.cli.main:main",
  ],

Maybe something that's going on is that we have a prior version that's installed and the new one isn't being picked up? So maybe @Jk40git you could try pip uninstall scribe-data; pip install . as a way of reinstalling the package?

If this does work for you, then something we'd need to make sure of is that the old version of things is removed when a person wants to install the new version :) Generally I think that this comment and the thread it's in should have most of the answers we're looking for. Things to try:

  1. Uninstall and reinstall
# Solution I suggested above
pip uninstall scribe-data
pip install .
  1. Force reinstall
# Should behave as the above
pip install --force-reinstall scribe-data
  1. Update the entry_points and console_scripts via the egg info
python setup.py egg_info

@Jk40git, can you try the above and let us know if scribe-data q -all works on your end?

If one of these works, we should definitely add a quick line about them into the readme and contributing guide!

Jk40git commented 2 months ago

It output this Screenshot 2024-07-09 212845

andrewtavis commented 2 months ago

Ah this is at least some progress :) What command fixed it for you, @Jk40git? You could do a PR to add it to the readme and contributing markdown files in the installation sections? If it's the first commands above, then maybe we can put the second as that's the first combined, so adding in the following:

# To install new functionality post updates:
pip install --force-reinstall scribe-data

The above would help people after we get the next release out :) Feel free to send something along that you think would help people, @Jk40git!

andrewtavis commented 2 months ago

And what I'm seeing above is that specifically the paths are not being computed correctly for Windows 🤔 So at one point in update_data.py early on we're using / based paths, and because of that the CLI thinks the paths are invalid and we get no language - data type pairs to query data for. We can maybe update the docs and then find the spot where this problem is? We should definitely adopt pathlib as much as we can.

Jk40git commented 2 months ago

Ah this is at least some progress :) What command fixed it for you, @Jk40git? You could do a PR to add it to the readme and contributing markdown files in the installation sections? If it's the first commands above, then maybe we can put the second as that's the first combined, so adding in the following:

# To install new functionality post updates:
pip install --force-reinstall scribe-data

The above would help people after we get the next release out :) Feel free to send something along that you think would help people, @Jk40git!

okay sure. I did the steps above

pip uninstall scribe-data
pip install .
python setup.py egg_info 
Jk40git commented 2 months ago

Ah this is at least some progress :) What command fixed it for you, @Jk40git? You could do a PR to add it to the readme and contributing markdown files in the installation sections? If it's the first commands above, then maybe we can put the second as that's the first combined, so adding in the following:

# To install new functionality post updates:
pip install --force-reinstall scribe-data

The above would help people after we get the next release out :) Feel free to send something along that you think would help people, @Jk40git!

At some point pip install --force-reinstall scribe-data fails cause of the misbehavior of the PyICU.

      pip uninstall scribe-data  
      pip install .

works fine But I need to comment out PyICU in the requirement file

Screenshot 2024-07-10 103342

andrewtavis commented 2 months ago

Ok, ya that PyICU stuff will need to be figured out later ... Will be great when we have Dockerized deployments up that just package the whole thing! Let's keep this documentation in mind!

@Jk40git, do you want to check and see where we have /s hardcoded? So by this I mean check the update_data.py process and the corresponding functions that it uses. From there we can plan out the changes and you can send some PRs to make the fixes?

Jk40git commented 2 months ago

Can you give more explanation or details. What am I supposed to change the / hardcoded to in the update_data.py?

andrewtavis commented 2 months ago

You'd use pathlib, which is set up to use /s or \ depending on operating systems :) So where you see a slash based URL, we should instead use pathlib.

Jk40git commented 2 months ago

You'd use pathlib, which is set up to use /s or \ depending on operating systems :) So where you see a slash based URL, we should instead use pathlib.

So I have to import path from pathlib right ?

andrewtavis commented 2 months ago

Yes that'd be the way. You can check how pathlib has already been used :)

axif0 commented 2 months ago

hello.. @andrewtavis

mhmohona commented 2 months ago

@andrewtavis, please assign @axif0 on this issue. I dont have any problem.

andrewtavis commented 2 months ago

Perfect, thank you both! I'll merge in the current PR from @Jk40git and then @axif0 can work on implementing further changes to the paths 😊

andrewtavis commented 1 month ago

I had some major rewrites to do so that we could get the translation process working, so I went ahead and did a ton of changes for this in aef7e5a. @axif0, do you want to try out some of the functionality on your machine and also check the code for cases where the paths are still hardcoded with "/" that are only valid for Linux/Mac? We want all paths to be derived via pathlib.Path :)

andrewtavis commented 1 month ago

At this point we're ready for people to actively start testing the new paths on Windows 😊 Can folks pull the current changes, install the current version with pip install -e . and then try some CLI commands to make sure they work out? We still have the problem with the PyICU dependency, but that should also be worked out soon!

andrewtavis commented 1 month ago

b82bb67 should close this up mostly :) Closing now and we can test all of this with the rest of the functionality before the v4.0 release 😊 Thanks all!