Closed pbilling closed 1 year ago
One issue is that Personalis created a new "va_mvp_phase3" directory to deliver new samples and has also changed the file naming conventions, without providing any notification. These changes occurred in December of 2022.
Manually checking cloud storage, it looks like the last genome from batch 2 was delivered in June 2022, so the new directory structure & naming conventions seem to be the sole culprit here. I'll need to deploy a hotfix so that Trellis recognizes the new pattern.
New object pattern examples:
Looks like only the fastqs have changed.
Added new match patterns in config/phase3/from-personalis/create-node-config.py in the NodeKinds class.
Also need to update the label_functions for extracting values from the object name. Existing method is to use helper methods (mate_pair_name_0, read_group_name_1). These seem fragile and unnecessary; I think I can just extract values using the regex groups.
Changes have been incorporated into hotfix-1.2.9
Workflow:
Verified that Fastq, JSON, and checksum nodes have all been added to the database and are being related by...
But FastqToUbam jobs are not being triggered.
Troubleshooting why LaunchFastqToUbam trigger is not being activated:
The value I was using to parse read groups by the old naming conventions has been replaced by a lane index and now fastqs are missing the 'readGroup' property.
Also, I'm realizing I need to update the get_fastq_metadata() function with logic to parse phase2 and phase3 Fastqs, not just the new phase3 ones.
Added logic to get Fastq properties from fullmatch.groupdict()
in get_fastq_metadata(): https://github.com/StanfordBioinformatics/trellis-mvp-functions/commit/7d7d03070231802847285dda686e000287acd373.
I tried writing a local test for behavior but it was turning out to be a pain and I'm going to deprecate these methods soon anyway, so I am just going to test interactively in the test environment.
I'm testing "adding" Fastqs by updating their metadata values. This way I can signal to Trellis that the state of an object has changed without having to actually move/change and object data.
Steps I use to validate addition on an object to the database:
Usually I sort of work backwards from 3 towards 1 since if there is an issue it will likely manifest in the end product (database node) and then 1 & 2 can be used for debugging.
Simple query to get logs for Cloud Functions:
resource.type="cloud_function"
severity=(DEFAULT OR DEBUG OR INFO OR NOTICE OR WARNING OR ERROR OR CRITICAL OR ALERT OR EMERGENCY)
Trellis seems to be ingesting and processing phase 3 data properly. Only issue I noticed is that there were (2) "Genome" nodes generated. This shouldn't break anything and is probably an artifact of me adding these nodes, deleting them, and them adding them again with second update. Trellis still shouldn't generate duplicates but I'm not going to spend time debugging now since the v1.3 update will deprecate these methods.
Most recent hotfix commit: https://github.com/StanfordBioinformatics/trellis-mvp-functions/commit/42b384eee3c1ab6ff696eee9cac2e96c452cde95.
Last sample added to database was created in June 2022