scitran / core

RESTful API
https://scitran.github.io
MIT License
18 stars 18 forks source link

Remove direct access of potentially missing keys in job enqueue #1047

Closed nagem closed 6 years ago

nagem commented 6 years ago

Fixes #1042

Updates files that might have missing keys* as well, but code should not be written expecting these keys to exist.

* A null version of these keys are always added to new files via upload.py. Files from old version of scitran or that were manually entered into the system might be missing these keys.

Breaking Changes

None

Review Checklist

codecov-io commented 6 years ago

Codecov Report

Merging #1047 into master will not change coverage. The diff coverage is 100%.

@@          Coverage Diff           @@
##           master   #1047   +/-   ##
======================================
  Coverage    90.7%   90.7%           
======================================
  Files          50      50           
  Lines        6864    6864           
======================================
  Hits         6226    6226           
  Misses        638     638
kofalt commented 6 years ago

Question: if the database upgrade fixes all missing keys, why also change the field access to tolerate them missing? Shouldn't we only need one or the other?

I want to understand why "code should not be written expecting these keys to exist" if going forward they always will...

nagem commented 6 years ago

It's true that we should only logically need one or the other. Unfortunately we don't specify anywhere that the keys should exist, and I am not 100% confident that all active users are only adding data through API endpoints. Assuming keys important to the structure of the data are present (a file's name and size, a session's project foreign key) is an acceptable assumption but periphery that is often null like tags and info is harder to expect to be present.

Why add a db update at all then? Although I'm not confident enough in this update being a "final fix" to allow a potential 500 key access error in auto-generated situations (rule eval) to continue existing, I am confident it will fix the existing old "bad" data and that the potential for new bad data is very small. We can reevaluate what part of the data model should always be present at a later time but for now, I don't believe the documentation has been clear enough to support direct key access on commonly null fields.

nagem commented 6 years ago

To add additional information, files are the only object in the system that set nulls for their allowable fields. I think this is a practice that could be useful for other objects but it should be something we standardize rather than having to remember you can (in clean data sets) directly access file keys but not others.

kofalt commented 6 years ago

Makes sense. Would you support changing our key access back to assuming these fields exist once this is more documented - say, with the coming Swagger docs?

If the root cause of any future missing keys is bad code or bad (direct) access, then it would seem like a good idea to keep surfacing the problems that causes.

nagem commented 6 years ago

We also have a collection of "integrity checks" and one could easily be written to test if the the problem is still around after this update. If the failure was more along the lines of "you can't view a file that is malformed" I would be more willing to allow strict key access and use the failures as an alert, but not with an opaque error in job creation/rule evaluation that a user wouldn't have good insight into.*

*Edit: Or might not ever see, in the case of rule eval for engine uploads

kofalt commented 6 years ago

It sounds like the real problem is not so much that we fail, rather that we fail silently. Given that operational problem, it makes sense to be overly conservative here.

We should solve the silent part, and that'll free us up for better choices in the future.