Open znatty22 opened 3 years ago
Should any of these three changes lead to a stop in the ingestion process? Or do we just want to report these things?
I think just report these things but maybe we should ask @allisonheath
I've been thinking through the 3rd checks here and it seems to me some parts of it should be done elsewhere. For example, checking that a harmonized genomic file's corresponding genomic file doesn't exist is something we can do immediately just using the GWO manifest itself. Query the dataservice for genomic-files which match the source file column entries and if any are missing then we have a problem.
EDIT: On second thought it probably is better to do it in the load_seq_exp_harmonized_genomic_files
method because then we don't need to make extraneous queries to the dataservice
@gsantia Yea the issue I wrote up might not be exactly how it turns out to be implemented. You will prob have a better idea since you're doing the implementation. The important thing is we're able to record and report any missing data which we feel is important for the user to know about
The study creator's
GenomicDataLoader
currently does not detect any discrepancies between the GWO manifest and S3 or between the GWO manifest and the Dataservice. This is an important part of the analysts' current manual process of loading the harmonized genomic file info into the Dataservice.Each of the 3 load functions in the
GenomicDataLoader
should be modified to detect discrepancies and report them either through log statements and/or event firing.Specifics:
In
load_harmonized_genomic_files
method:In
load_specimen_harmonized_gf_links
method:In
load_seq_exp_harmonized_genomic_files
method: