scribe-org / Scribe-Data

Wikidata, Wiktionary and Wikipedia language data extraction
GNU General Public License v3.0
23 stars 25 forks source link

Implement `--output-dir` and `--overwrite` functionality for a JSON file output #144

Closed andrewtavis closed 1 month ago

andrewtavis commented 3 months ago

Terms

Description

This issue would add in the --output-dir (-od) and --overwrite (-o) functionality to the Scribe-Data CLI. This will allow the user to specify a directory where the results of the --query command will be written. Note that I renamed this --output-dir from the discussed --output-file as if there's more than one output file, then the string provided will need to be to a directory. So as to avoid checks for if we're returning a file or a directory and issues that that would cause for the end users, let's always return a directory. Things this argument will do:

Contribution

@mhmohona will be working on this as a part of GSoC 2024 ā˜€ļø Please write in here if you would, and let us know if you need some support! šŸ˜Š

andrewtavis commented 3 months ago

We also should discuss what the default export directory name is šŸ¤” I think there's value in branding it as it'll also make it easier for the lay user to find? scribe_data_export? If so, we should rename language_data_export :)

mhmohona commented 3 months ago

Yea, we can rename it to scribe_data_export.

andrewtavis commented 3 months ago

Hey @mhmohona šŸ‘‹ FYI I'm realizing that it'd make sense to have file type based names for the export directories, as ultimately the JSON directory will still need to exist while sqlite directories are being created. Also makes the return and distinguishing between them a bit more easy. As seen in the project root, we now have scribe_data_json_export and scribe_data_sqlite_export. Let me know if you think this makes sense! We can also change it back later šŸ˜Š

mhmohona commented 3 months ago

So I have worked on this issue, and its how getting the output -

image

Is it okay? How can I make an improvement on it?

andrewtavis commented 3 months ago

We can help on the output for this, @mhmohona, but as said on Matrix, let's include the length of the file :) Check the line in the utils for how the original output of update_data.py looks as well šŸ˜Š

andrewtavis commented 3 months ago

For this one, @mhmohona, we need to convert Scribe-Data's CLI query command over to using update_data.py. Maybe we can do a call on this at some point to plan this out a bit better? I think that we have the basics of what's needed here, but remember that the goal is that we're brining down new data, not moving the data from scribe_data_json_export as this file will eventually be removed šŸ˜Š

andrewtavis commented 3 months ago

Thinking about this further, @mhmohona, as of now scribe_data/wikidata/update_data.py is just a script that's ran via the command line. What likely needs to happen is that we need to put the code for that file into a function that we can then import into the scribe_data/cli/query.py file :) We won't be running update_data.py directly anymore, so this should work really well šŸ˜Š

mhmohona commented 2 months ago

@andrewtavis,does this satisfy the requirement?

image

andrewtavis commented 2 months ago

Looking really great, @mhmohona! I think that we're ready for a PR :)

mhmohona commented 2 months ago

I have pushed my changes on #163 as I forgot to switch branch before working. šŸ˜…

andrewtavis commented 1 month ago

Closed by #163 šŸ˜Š Thanks for all the work here, @mhmohona! ā˜€ļø