Closed ayyubibrahimi closed 1 year ago
Running make returned an error.
md5sum fuse/cross_agency.py > .deba/md5/fuse/cross_agency.py.md5
key not found
key not found
make: *** [wrgl.mk:4: pull_person] Error 1
So I tried make
locally and I ran into this error:
running ner/post_officer_history_reports.py
Traceback (most recent call last):
File "/Users/khoipham/projects/PPACT/ner/post_officer_history_reports.py", line 54, in <module>
trained_model = spacy.load(
File "/Users/khoipham/.virtualenvs/base/lib/python3.9/site-packages/spacy/__init__.py", line 51, in load
return util.load_model(
File "/Users/khoipham/.virtualenvs/base/lib/python3.9/site-packages/spacy/util.py", line 427, in load_model
raise IOError(Errors.E050.format(name=name))
OSError: [E050] Can't find model 'data/ner/post/post_officer_history/model/post_officer_history_3.model'. It doesn't seem to be a Python package or a valid path to a data directory.
make: *** [.deba/deps/ner.d:13: data/ner/advocate_post_officer_history_reports.csv] Error 1
Perhaps this file wasn't uploaded?
Hm. I just attempted to re-dvc push
the file post_officer_history_3.model
but it seems that everything is up to date. Will look into it further.
You should be able to run make
now without an error
.
Saw this during make
:
make: Circular data/fuse/personnel_pre_post.csv <- data/match/post_officer_history.csv dependency dropped.
make: Circular data/fuse/allegation.csv <- data/fuse/allegation.csv dependency dropped.
make: Circular data/fuse/event_pre_post.csv <- data/fuse/event_pre_post.csv dependency dropped.
make: Circular data/fuse/event_pre_post.csv <- data/fuse/personnel_pre_post.csv dependency dropped.
make: Circular data/fuse/use_of_force.csv <- data/fuse/use_of_force.csv dependency dropped.
make: Circular data/fuse/use_of_force.csv <- data/match/post_officer_history.csv dependency dropped.
make: Circular data/fuse/use_of_force.csv <- data/fuse/allegation.csv dependency dropped.
make: Circular data/fuse/use_of_force.csv <- data/fuse/event_pre_post.csv dependency dropped.
make: Circular data/fuse/use_of_force.csv <- data/fuse/personnel_pre_post.csv dependency dropped.
make: Circular data/fuse/use_of_force.csv <- data/fuse/use_of_force.csv dependency dropped.
make: Circular data/fuse/use_of_force.csv <- data/match/post_officer_history.csv dependency dropped.
make: Circular data/fuse/use_of_force.csv <- data/fuse/allegation.csv dependency dropped.
make: Circular data/fuse/use_of_force.csv <- data/fuse/event_pre_post.csv dependency dropped.
make: Circular data/fuse/use_of_force.csv <- data/fuse/personnel_pre_post.csv dependency dropped.
make: Circular data/fuse/use_of_force.csv <- data/fuse/use_of_force.csv dependency dropped.
I think the fuse/post_officer_history.py
script is problematic. I saw it both read and write to fuse/use_of_force.csv
. Perhaps this file should be in another stage altogether.Basically we want all the dependency to form a directed acyclic graph (DAG). Drawing out the dependency in a graph application like draw.io could help.
Even if it's not causing problem right now it might not produce the results that you think it is.
Looks like fuse/all.py
also has circular dependency. It is running 3 times for me now. Perhaps a tool like this could help: https://github.com/lindenb/makefile2graph
Thanks for the reminder. We discussed the fuse/post_officer_history.py
script a while ago as a temporary solution. Agreed that it's time to find a permanent solution. I'll begin making those changes.
Apologies if you ran into another error. make
should now run without error on your end.
Do you need me to look at the error over the weekend?
That would be great. The new process-data error is:
/home/runner/.local/bin//gsutil -m rsync -i -J -r gs://k8s-ocr-jobqueue-results/ocr/ data/ocr_results/
CommandException: arg (data/ocr_results/) does not name a directory, bucket, or bucket subdir.
If there is an object with the same path, please add a trailing
slash to specify the directory.
make: *** [Makefile:[33](https://github.com/ipno-llead/processing/actions/runs/4367563035/jobs/7639035885#step:7:34): ocr_results] Error 1
Error: Process completed with exit code 2.
And on my local when I run make
I now receive the following error:
md5sum fuse/cross_agency.py > .deba/md5/fuse/cross_agency.py.md5
key not found
key not found
make: *** [wrgl.mk:4: pull_person] Error 1
I'll continue to attempt to debug these errors. Thanks!
I plan to address the circular dependencies by adding a new stage after this PR is merged.
I ran everything fine on local. Can you run wrgl pull --all
and show me the output?
Looks like you're still training ner/post/post_officer_history/model/post_officer_history.model
. Please finish training and push that file to DVC.
Also, please review my commits. deba.data
should be used whenever you refer to any file inside data
folder. Otherwise the job won't run correctly.
Thanks Khoi. Below is the error returned when I run wrgl pull --all
error fetching objects: error poping haves: GetCommit 2b6c52f53a18e0f0b232b63fddefa475 error: key not found
Just run it again and again until it succeed. I reckon there's a bug but for now it's not so serious that you cannot proceed.
So looks like it is trying to queue pdf for OCR, which isn't supposed to happen. All OCR queueing should be done locally. Can you
dvc pull
andmake
again? I think there are some metadata outputs that are outdated.