Closed andrewtavis closed 6 months ago
Hey @shashank-iitbhu 👋 Can you write in here so I can assign :)
QUERIED_DATA_FILE = f"{QUERIED_DATA_TYPE}_queried.json"
Can't seem to find QUERIED_DATA_FILE
. Do i need to run another operation before trying to run format_nouns.py
?
Hey @shashank-iitbhu 👋 Was part of the demonstration on Saturday, but not quite as visible. If you look at the end of the formatting files you'll see that this file is deleted, so we query this JSON from Wikidata and then delete it after the formatting step. Hence the files aren't in the repo :)
Is the output from a run of update_data.py
? If it's just got formatting step being ran, then no stress! You won't have the file then.
Oh, Got it! The output was from the formatting step only. I am able to run update_data.py
successfully.
@shashank-iitbhu how did you resolve this issue? I am having the same error when I try to run format_nouns.py
.
@shashank-iitbhu how did you resolve this issue? I am having the same error when I try to run
format_nouns.py
.
Are you getting scribe_data module not found
error or ***_queried.json
file not found?
Adding those lines mentioned in element chat should resolve the scribe_data module not found
error.
As in update_data.py
, the ***_queried.json
files are deleted after the formatting process, these are not present in the codebase, that's why format_nouns.py
can't be run independently.
I am getting the ***_queried.json file not found error
python3 src/scribe_data/extract_transform/languages/German/nouns/format_nouns.py
Traceback (most recent call last):
File "/Users/ikeadeoyin/Documents/WikimediaGSoC2024/Scribe-Data/src/scribe_data/extract_transform/languages/German/nouns/format_nouns.py", line 40, in <module>
with open(data_path, encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/Users/ikeadeoyin/Documents/WikimediaGSoC2024//Scribe-Data/src/scribe_data/extract_transform/languages/German/nouns/nouns_queried.json'
So I need to run the update_data.py
before running format_nouns.py
?
So I need to run the
update_data.py
before runningformat_nouns.py
?
update_data.py
is the main data process which triggers SPARQL queries to query language data from Wikidata and runs the formatting operation by running all the format_***.py
files. You just need to run update_data.py
, no need to run format_nouns.py
after that.
So I need to run the
update_data.py
before runningformat_nouns.py
?
update_data.py
is the main data process which triggers SPARQL queries to query language data from Wikidata and runs the formatting operation by running all theformat_***.py
files. You just need to runupdate_data.py
, no need to runformat_nouns.py
after that.
Thank you so much! I was able to run update_data.py
successfully.
Thanks to you both for working this through!
Terms
Description
In the process of updating the data formatting process, the steps to load in data and export it were standardized such that they're taking in a
LANGUAGE
variable as well as one forQUERIED_DATA_TYPE
. This can be seen for example in German/nouns/format_nouns.py. It would be great if the lines for importing the Wikidata data as well as those for exporting the final output to theformatted_data
directories could be extracted to common functions that could then be loaded in and ran from the each of the formatting files 😊Contribution
Happy to support someone on this or get to it myself eventually! This is a great
good first issue
for someone wanting to get into Scribe a bit 😊