scribe-org / Scribe-Data

Wikidata, Wiktionary and Wikipedia language data extraction
GNU General Public License v3.0
23 stars 25 forks source link

Simplify formatting process to lexeme based outputs rather than string based #142

Open andrewtavis opened 3 months ago

andrewtavis commented 3 months ago

Terms

Description

This issue would work on simplifying the data formatting process - specifically the noun formatting processes - such that they exports lexeme based files from the data. As of now the formatting processes are very long and migrate all the data from Wikidata into string based key JSONs, so lexemes with different meanings but that are the same string would be in the same entry. We do not want to do this anymore based on decisions in #59 and #110 :)

Contribution

I'll be working on this, and once it's done the old state of Scribe-Data will be officially cleaned up!