scribe-org / Scribe-Data

Wikidata, Wiktionary and Wikipedia language data extraction
GNU General Public License v3.0
23 stars 25 forks source link

.gitignore the output files from intermediary steps #97

Closed wkyoshida closed 6 months ago

wkyoshida commented 6 months ago

Terms

Issue

In looking at some recent PRs, wondering if it could make sense to add some of the output files from intermediary steps of the data process into the .gitignore, so that they don't get committed in by mistake. An example of such a file would be the nouns_queried.json, that is an intermediary step to getting the formatted nouns.json file.

One idea could be to:


Would this make even make sense @andrewtavis? Wondered about this especially since folks are contributing more to data processes.

andrewtavis commented 6 months ago

I think this is something to consider, @wkyoshida :) I'm a bit confused as to why these files are still being generated as they're destroyed at the end of the formatting steps. We could just do nouns_queried.json and the ones for the other word types though? Not sure why we'd need the intermediary or Scribe names as they're already distinctly named :)

shashank-iitbhu commented 6 months ago

I think this is something to consider, @wkyoshida :) I'm a bit confused as to why these files are still being generated as they're destroyed at the end of the formatting steps. We could just do nouns_queried.json and the ones for the other word types though? Not sure why we'd need the intermediary or Scribe names as they're already distinctly named :)

The {data_type}_queried.json files are correctly being deleted at the end of the formatting process. In PR #93, it appears that this file was accidentally committed after running the query process explicitly. Just adding {data_type}_queried.json to .gitignore for the data types nouns, verbs and prepositions would cover for such accidental commits.

wkyoshida commented 6 months ago

.. Not sure why we'd need the intermediary or Scribe names as they're already distinctly named :)

Gotchu! Yeah - that idea came about mostly if there were any other files I missed and wasn't remembering them :laughing: if they happened to have differing formats, a singular format would allow a single .gitignore entry to cover all of them, but perhaps I was over-complicating things :sweat_smile:

If the only files in question are the {data_type}_queried.json though, then as @shashank-iitbhu suggested a simple **/*_queried.json already covers this for us :rocket:


This is simple. I can do it this week, but if anyone would like to jump in before me, feel free by all means :grin:

andrewtavis commented 6 months ago

98c899d closes this up :) Wanted to close out some things as there's LOTS going on right now 😊