scribe-org / Scribe-Data

Wikidata, Wiktionary and Wikipedia language data extraction
GNU General Public License v3.0
23 stars 25 forks source link

Scribe-data CLI tool implementation #140

Closed mhmohona closed 3 months ago

mhmohona commented 3 months ago

Contributor checklist


Description

  1. list-languages (-ll)

    • list available lang codes

    • commands:
      scribe-data ll
      scribe-data languages-list

    • list available word types per lang

    • commands:
      scribe-data list-word-types -l German

  2. language (-l) and word-type (--wt)

    • commands:
      scribe-data query -l English -wt nouns
      scribe-data query -l English -wt verbs
      scribe-data query -l English -wt translated_words

Related issue

Fixes

github-actions[bot] commented 3 months ago

Thank you for the pull request!

The Scribe team will do our best to address your contribution as soon as we can. The following is a checklist for maintainers to make sure this process goes as well as possible. Feel free to address the points below yourself in further commits if you realize that actions are needed :)

If you're not already a member of our public Matrix community, please consider joining! We'd suggest using Element as your Matrix client, and definitely join the General and Data rooms once you're in. Also consider joining our bi-weekly Saturday dev syncs. It'd be great to have you!

Maintainer checklist

mhmohona commented 3 months ago

@andrewtavis, @wkyoshida, I have worked on listing all language as this task seemed easier and wanted to take baby step towards the complete cli tool. Here is the output of all languages - image

The file name Will suggested first, which was scribe-data.py seems better now, as it would make the commands look pretty. Also I think we need to move the cli script in root directory, so that commands become simple.

andrewtavis commented 3 months ago

I think that we could keep it as cli.py, as something that will happen is that when Scribe-Data is installed we should be able to directly access it from the command like without saying python3 .... With that we should be able to name it what we want, and then we'd of course change the cli trigger to scribe-data as Will you suggested 😊

andrewtavis commented 3 months ago

Do you want to check the linting and formatting checks for your commit, @mhmohona? No stress on the Mac build fail, sadly, but the formatting check did have some things that need to be fixed. You can see that above!

This will hopefully get easier once the new contributor has the pre-commit issue done as the linting fixes will be done for you on commit 😊

mhmohona commented 3 months ago

Here is the update for query.

python3 src/scribe_data/cli.py query -l German -wt nouns

image

python3 src/scribe_data/cli.py query -l German -wt verbs

image

python3 src/scribe_data/cli.py query -l Russian -wt translated_words

image

Now the question, for emoji keywords, auto suggestions, and translations files - shall I add about them as well? Secondly, is the formatting ok? Or shall I put it in table?

andrewtavis commented 3 months ago

Hey @mhmohona 👋 Checking on this, is this getting the JSON values from the language_data_export directory, or running update_data.py given the arguments? The latter would be the planned functionality, but we can also works towards it!

Another thing, maybe you could research how to look into implementing it so that we have the CLI installed when the package is installed. So in the installation instructions we have the following (pre-commit was just added and checks your commits - definitely suggested to adopt it 😊):

pip install --upgrade pip  # make sure that pip is at the latest version
pip install -r requirements.txt  # install dependencies
pip install -e .  # install the local version of Scribe-Data
pre-commit install  # install pre-commit hooks
# pre-commit run --all-files  # lint and fix common problems in the codebase

What would be really great would be if the process of doing pip install -e . would mean that rather than:

python3 src/scribe_data/cli.py query -l German -wt verbs

we could instead do:

scribe-data query -l German -wt verbs

I'm assuming that this would be changes in setup.py or another installation setting 🤔 @wkyoshida, do you have an idea on this?

mhmohona commented 3 months ago

@andrewtavis,so Ihave updated the commands. It now looks like this -

image

Still needs to work on the language code, so instead of writing English,only en would work.

andrewtavis commented 3 months ago

This is great, @mhmohona! Thanks for all the hard work!

andrewtavis commented 3 months ago

Requesting review from both of us eventually, @wkyoshida :) Let us know when it's ready for a final check, @mhmohona, and we're of course happy to answer questions along the way!

andrewtavis commented 3 months ago

Hey @mhmohona and @wkyoshida! 👋 I'm going to give this a look right now and fix up the test errors so we can bring this in :) We've got lots in here already, and I think the path ahead will be more clear once this is done and we can do one PR per issue 😊