Open Alfredo-Enrique opened 2 years ago
This is caused by the filenames containing :
characters, which are ambiguous to interpret with the library used by Spark. They originate from the LB
tag in the BAM read group header. This is something we'll want to fix in align-DNA + all other pipelines with either standardized filenames and/or a function to clean up special characters from filenames.
This is caused by the filenames containing
:
characters, which are ambiguous to interpret with the library used by Spark. They originate from theLB
tag in the BAM read group header. This is something we'll want to fix in align-DNA + all other pipelines with either standardized filenames and/or a function to clean up special characters from filenames.
Good catch @yashpatel6! I'm guessing that this will be more of an issue with files generated externally outside of UCLA then with internal files. Funnily enough @tyamaguchi-ucla was just mentioning that the file names in the command looked weird. Thanks for figuring out where they were coming from.
Correct, I actually opened a PR for a Nextflow module for sanitizing strings, especially for filenames. This is something that'll need to be rolled out to the individual pipelines so I'm going to plan for the fix to passively be in the v2.0.0 release for the metapipeline as the individual pipelines get updated.
Correct, I actually opened a PR for a Nextflow module for sanitizing strings, especially for filenames. This is something that'll need to be rolled out to the individual pipelines so I'm going to plan for the fix to passively be in the v2.0.0 release for the metapipeline as the individual pipelines get updated.
Hmm... can I just go ahead and do a janky-imperfect version of this specific use case? Cause I don't think I cam move on with my project otherwise.
Hmm... can I just go ahead and do a janky-imperfect version of this specific use case? Cause I don't think I cam move on with my project otherwise.
I have a few suggestions if you need to process the samples ASAP:
:
characters as a quick fix (ex. @RG ... LB:WGS:WTSI:30177 ...
-> @RG ... LB:WGS-WTSI-30177 ...
).I would lean towards the third option since it's less manual work and doesn't require modifying the raw input files.
Thank-you for the wonderful suggestions @yashpatel6. I'll either do 2 or 3. I have to pivot to a different task for the next two day but will revisit this at the end of the week. Thank-you for the help and suggestions!
Describe the issue Align-DNA error when running metapipeline-DNA when testing with a WGS bam file. Erro seems to be during the Mark Duplicates function. Basing this just on the error which refers to a sorted bam. Full error at the end of error report. Excerpt here.
/hot/user/alfgonzalez/pipeline/metapipeline-DNA/agonz_update_bam2fastq_to_d4e8a73/metapipeline-DNA
nextflow run main.nf -c ./config/DO5264_run1_meta-lead.config
/hot/user/alfgonzalez/pipeline/metapipeline-DNA/agonz_update_bam2fastq_to_d4e8a73/metapipeline-DNA/config/DO5264_run1_meta-lead.config
/hot/user/alfgonzalez/pipeline/metapipeline-DNA/agonz_update_bam2fastq_to_d4e8a73/metapipeline-DNA/config/DO5264_run1_meta-pipeline.config
/hot/user/alfgonzalez/pipeline/metapipeline-DNA/agonz_update_bam2fastq_to_d4e8a73/metapipeline-DNA/output/DO5264/leading_work_dir/04/a7d91458763128a3133145fa9e0b9d/.command.log
/hot/user/alfgonzalez/pipeline/metapipeline-DNA/agonz_update_bam2fastq_to_d4e8a73/metapipeline-DNA/output/DO5264
/hot/user/alfgonzalez/pipeline/metapipeline-DNA/agonz_update_bam2fastq_to_d4e8a73/metapipeline-DNA/output/DO5264/leading_work_dir
To Reproduce Steps to reproduce the behavior:
Expected behavior Works through bam2fastq without issue but fails at align-DNA step after sorting bam.
Screenshots If applicable, add screenshots to help explain your problem.
Excert of .command.log: