When generating multiple reports for a cohort (splitting by year, or an arbitrary split), I was deriving the output folder to write additional reports to from Path(<main_output_file>).parent
The use of Pathlib.Path on a gs://... address removes the double //
cloudpathlib interprets a gs:/... prefix to be a local path, and can't open the requested file path
Result was that for the cohorts we now split (due to being... massive) we now write the main and latest report, then the report generation script breaks, so we don't get the mini-reports
Second issue - #421
In the Hail analysis of SV data, the process quits if there's no overlap between the PED and VCF
This has caused recurrent failures where a participant has both Exome and Genome results - only the exome data is being retained in the pedigree, so there appears to be no overlap with the exome-only CNV data
Proposed Changes
Stop using Pathlib.parent to get the directory name
Instead just get the file name, and remove from the full path to get the directory
use plain string manipulation to build the path after that
Fixes
Path(<main_output_file>).parent
gs://...
address removes the double//
gs:/...
prefix to be a local path, and can't open the requested file pathSecond issue - #421
In the Hail analysis of SV data, the process quits if there's no overlap between the PED and VCF
This has caused recurrent failures where a participant has both Exome and Genome results - only the exome data is being retained in the pedigree, so there appears to be no overlap with the exome-only CNV data
Proposed Changes
Checklist