Closed inodb closed 3 years ago
seems like its probably a data issue - theres no diagnosis showing up for these files.
@inodb @adamabeshouse maybe we should throw some errors when we find broken data. this would be very easy to do as part of the "lineage" routine that we run on load. it would be nice to catch this in the validation stage but someone would probably have to write much the same logic as we've already written to catch it.
@alisman so every file should have at least 1 diagnosis associated to it? what other data validation rules can we think of?
personally I think it would be better to catch it in validation so it can be fixed before it is pushed/goes live
yeah agreed that we should prolly try to catch this in validation - i got in touch with them to update the diagnosis data
Note that data portal requirements (such is it needs at least diagnosis clinical data) is sort of another level of validation specific to the data portal, but still prolly better to keep that where the other validation is done as well
@alisman further thought - if we need to write the same logic in validation as we would in frontend to catch this, maybe we should actually just go ahead and do that data processing in the validation/import stage.
There are two levels of validation atm:
Then there are also two normalization/transformation steps:
get_syn_data.py
script where we pull the data from synapse and massage it into the JSON that the frontend uses.So it is use case specific to figure out where the validation/normalization makes the most sense. In this case because it's about missing diagnosis data and not some missing field we could flag it in e.g. 2. Alternatively we put it in 3 since it's data portal specific and with 1-3 all being coded in python it wouldn't be too complex to move around anyway. That would make 3 more of a combo of validation/normalization.
Thanks for the thorough rundown. In my opinion, for this case we should do it in (3) because that way we can port the relevant logic from the frontend and attach those computations at that time (i.e. diagnosis, biospecimen, primaryParents), at the same time as we validate.
fyi this is fixed in their latest metadata
Cases tab is empty: