scribe-org / Scribe-Data

Wikidata and Wikipedia language data extraction
GNU General Public License v3.0
18 stars 19 forks source link

Implement CLI query `.csv` and `.tsv` `--output-type` functionality via `convert` #146

Open andrewtavis opened 1 month ago

andrewtavis commented 1 month ago

Terms

Description

As discussed in the 24/5/2024 GSoC sync, we'd like Scribe-Data to be able to also export other file types besides JSONs. This issue would look into the exporting of .csv and .tsv files. The user would call --ouput-type csv or --ouput-type tsv (or .csv/.tsv just to be sure), and then the resulting JSON files would be converted over to .csv/.tsv files. To be more explicit, Scribe-Data would first export JSON files as that's the baseline output file type, and then the convert process would run such that these filetypes are converted over to the required filetypes. The files in OUTPUT_DIR/German/ like nouns.json and verbs.json would be rewritten to OUTPUT_DIR/German/nouns.csv, etc.

Contribution

@mhmohona will be working on this as a part of GSoC 2024! ☀️ Please write in here so I can assign, and let us know if there's anything we can do to support!

mhmohona commented 1 month ago

gonna work on it!

andrewtavis commented 2 weeks ago

Note that for this issue and #145 we should be using scribe_data/cli/convert.py, with the functionality going in there. If a person wants to query from Wikidata and convert the output automatically, then the functions for convert should be running in the background 😊