Open joeflack4 opened 5 months ago
If mondo ingest refresh is not phony, i.e. generates a file during the process, a make
run will not try to run it multiple times. So once it checked once if there was something to updated, and a file is written, make
will know its looked at the goal, and wont look again for the duration of the entire pipeline.
I can't put much thought into this right now, but that's incorrect, I believe, based on the very idiosyncratic way that this new "intelligent refresh" works for these goals.
The dependencies are no longer in the 'prereqs' section of the goal. They're called in the body. A check happens, and if an update to mondo has been made, then these files get rebuilt. So it is theoretically possible for this to happen multiple times during a build if a commit occurs mid-build. That's what this issue is about.
I wouldn't worry about it too much; it's a low priority / edge case.
@twhetzel Just noticing that some days main
in mondo
does indeed get updated often.
IDK how much of a problem this can cause for us.
What the issue is:
During the build, tmp/mondo/
and its artefacts can refresh multiple times.
Causes problems:
I agree, I was bugged by this as well. This is the week the curators do a lot of work on Mondo, so basically we had 3-5 new commits each day. For testing this is annoying. I added a feature to simply skip refreshing during development, which does not affect the default behaviour:
https://github.com/monarch-initiative/mondo-ingest/commit/822becde7eb2af1d29a1836f1f18a9b2b4e4c84a
Overview
This is minor because this is not likely to happen very often, and when it does happen, I'm not sure how much of a major issue it would be. But it could create very confusing scenarios if it does happen.
Since #551, now whenever a goal is run which has requires
mondo.owl
ormondo.ssom.tsv
(or possibly in the future any other output which requires the refresh of a clonedmondo
repo to generate such a file as with these files), it checks if there has been any updates tomain
in themondo
repo. If so, it refreshes the repo, pulling down that most recent commit, and rebuilds these files. Between commits, it is possible that there will be variations in these outputs. Since these filesmondo.owl
andmondo.sssom.tsv
are needed at multiple points in the pipeline, it's possible that a new commit could occur mid-build, such that the versions of these files used in some goals earlier in the build are different than later in the build.Possible solutions
Perhaps one solution would be, at the start of
build-mondo-ingest
, to create a file (delete it first if it already exist) or an environmental variable that stores a boolean about whether or not the "refreshmondo
repo" goal has been ran. So at first, we would set this to false, or maybe we just wouldn't set the variable / create the file. If the refreshmondo
goal is run, we then create the file / set the variable to true. And, every time that the refreshmondo
goal is run, it checks if this file / variable is true, and if so, it skips. I suppose an environmental variable is better, since we would have to set this to false / delete at the end of the build, and if there was an error during the build, this would be easier to do if an environmental variable (automatic) than a file, which would require fancy error handling that I'm not sure is possible.Related
Other intelligent refresh related issues:
593