scribe-org / Scribe-Data

Wikidata, Wiktionary and Wikipedia language data extraction
GNU General Public License v3.0
23 stars 25 forks source link

Implement CLI query `sqlite` `--output-type` functionality via `convert` #145

Closed andrewtavis closed 1 month ago

andrewtavis commented 3 months ago

Terms

Description

As discussed in the 24/5/2024 GSoC sync, we'd like Scribe-Data to be able to also export other file types besides JSONs. This issue would look into the exporting of .sqlite database files. The user would call --ouput-type sqlite (or .sqlite just to be sure), and then the resulting JSON files would be converted over to .sqlite data files. To be more explicit, Scribe-Data would first export JSON files as that's the baseline output file type, and then the convert process would run such that these filetypes are converted over to SQLite databases. The files in OUTPUT_DIR/German/ like nouns.json and verbs.json would be within a german.sqlite file with nouns and verbs tables.

Contribution

@mhmohona will be working on this as a part of GSoC 2024! ☀️ Please write in here so I can assign, and let us know if there's anything we can do to support!

mhmohona commented 3 months ago

gonna work on it!

andrewtavis commented 3 months ago

@mhmohona, quick check on things :) I think it make sense to have covert just be from JSON to SQLite, CSV or TSV. The user will always get a JSON at first and then this process can be one directional to what they want? Or can you think of cases where an SQLite Scribe-Data output should be changed to a CSV, etc?

andrewtavis commented 1 month ago

Closed by #163! The features are really coming along, @mhmohona! ☀️