Closed ambrussimon closed 6 years ago
Merging #1015 into master will increase coverage by
0.07%
. The diff coverage is91.07%
.
@@ Coverage Diff @@
## master #1015 +/- ##
==========================================
+ Coverage 90.7% 90.78% +0.07%
==========================================
Files 50 50
Lines 6802 6869 +67
==========================================
+ Hits 6170 6236 +66
- Misses 632 633 +1
To be clear, would the engine call /prepare-complete
, get a ticket, and then call /upload/engine
with that ticket ID? In what scenario would someone call /accept-failed-output
?
@kofalt Correct. Formalizing just a bit more (current implementation):
POST /jobs/x/prepare-complete {"success": true}
=> create & return {"ticket": "y"}
POST /upload/engine?level=L&id=ID&job_id=x&job_ticket_id=y
FileFormOn /accept-failed-output
: my naive assumption was that there might be algs that have meta/files in their failure output that users still would like to store/further analyze. This shouldn't be the default though, hence the marking, but leaving the option.
Another TBD I forgot about was whether to require the job_ticket_id
URL param at all during /engine/upload
, since I can find the relevant ticket just from the job_id... But then we are ever closer to just storing the success/failure on the job itself.
Is there a purpose for the level
, id
, and job_id
fields if there is already job_ticket_id
?
I would remove all but the later.
Other than that, I think this sounds good. I will have to think about it again later when I do a proper code review & test extensively with the current engine.
Another TBD I forgot about was whether to require the job_ticket_id URL param at all during /engine/upload, since I can find the relevant ticket just from the job_id
I would not do that; it is not valid to assume that there is a 1:1 ratio of jobs to job-completion-tickets.
Also, do you have any plans for disabling the old upload behavior in a later release? Probably just flipping a switch that rejects /engine/upload
if there is no job_ticket_id
, and removing any related code.
@nagem @kofalt Fixed/added everything discussed + rebased. PR now ready for final review.
Code looks pretty good, I made some comments above that ended up in collapsed sections, please check them. Some code moved around so it's harder to tell which comments still apply.
I'm going to run at test job load on this API branch and report back on my results.
One requirement that I realized was missed in the writeup on #968 is that jobs should not spawn until file results are accepted. The function signature for calling jobs shouldn't be too difficult to reproduce, since we are also applying the metadata here:
rules.create_jobs(db, container_before, container_after, container_type)
Note: this can be called after all files are accepted (doesn't need to be per file like it is in placer logic), and is only called on non-analysis jobs.
We would also need to modify rule evaluation to skip files that have the failed marking. For efficiency we can also skip it in the placer if the job is failed but it would need to exist in the rules because the create_jobs
logic checks to see all jobs that could have been spawned by the old container and all jobs that can be spawned with the new container and creates the ones in the diff between new and old.
Sorry I didn't notice this until now, let me know if anything is unclear in my description above and we can sort it out. After that change and the one comment I added about ContainerReferencing 500ing, this is good to go in my opinion. @kofalt looks like he had a check he'd like to run before his approval, though.
Correct. I've been out for a few days and been unable to run my tests, but it sounds like one more modification needs to be made. Once the final changes are ready, I'll hit this branch with the engine properly to make sure it's good to go.
@kofalt @nagem Fixed. Please go ahead and hit it with the engine.
Final changes look good from my end 👍. Nice work on this ticket!
Did a quick rebase to make sure branch is still good, I hope you don't mind. Feature checks out on current engine from my side of things.
Feature looks good to go pending a question above.
Access log is added, feel free to merge when CI is green @ambrussimon.
Verified locally that no rebase is needed. Merging.
Resolves #968
Notes/TBDs:
/jobs/x/complete
endpoint as discussed, added feature to existing/upload/engine
endpoint in a backwards-compatible way instead. Opinions?Review Checklist