hartwigmedical / hmftools

Various algorithms for analysing genomics data
GNU General Public License v3.0
198 stars 59 forks source link

Purple step error and Linx #343

Closed FedericaBrando closed 1 year ago

FedericaBrando commented 2 years ago

Me, again. I am running the panel demo pipeline with the test data provided in the repo with the following command:

scripts/run_pipeline scripts test_data ref_data_dir/37/PANEL tools_dir "COLO829T,COLO829R" V37 PANEL 8 64 

but I get the following error at the Purple step:

11:55:33 - [INFO ] - Purple version: 3.7.2
11:55:33 - [INFO ] - output directory: test_data/COLO829T/purple/
11:55:33 - [INFO ] - reference(NONE) tumor(COLO829T) running on target-regions only
11:55:33 - [INFO ] - using ref genome: V37
11:55:33 - [INFO ] - loaded 1 alternative transcripts from 1 genes
11:55:35 - [INFO ] - loaded 7690 target regions bases(total=1362692 coding=1362692) from file(ref_data_dir/37/PANEL/variants/CoverageCodingPanel.37.bed.gz)
11:55:35 - [INFO ] - loaded 130 MSI INDELs from file(ref_data_dir/37/PANEL/copy_number/target_regions_msi_indels.37.tsv)
11:55:35 - [INFO ] - target regions: tml(0.74) tmb(0.05) msiIndels(220.0) afMax(0.9 diff=0.08), msiAF(2-3 base=0.15 4 base=0.08) codingBaseFactor(150000)
11:55:35 - [INFO ] - reading GC Profiles from ref_data_dir/37/PANEL/copy_number/GC_profile.1000bp.37.cnp
11:55:38 - [INFO ] - reading Amber QC from test_data/COLO829T/amber/COLO829T.amber.qc
11:55:38 - [INFO ] - reading Amber BAFs from test_data/COLO829T/amber/COLO829T.amber.baf.tsv.gz
11:55:38 - [INFO ] - reading Amber PCFs from test_data/COLO829T/amber/COLO829T.amber.baf.pcf
11:55:38 - [INFO ] - average Amber tumor depth is 139 reads implying an ambiguous BAF of 0.537
11:55:38 - [INFO ] - reading Cobalt tumor segments from test_data/COLO829T/cobalt/COLO829T.cobalt.ratio.pcf
11:55:38 - [INFO ] - reading Cobalt ratios from test_data/COLO829T/cobalt/COLO829T.cobalt.ratio.tsv.gz
11:55:41 - [INFO ] - loading structural variants from test_data/COLO829T/gripss/COLO829T.gripss.filtered.somatic.vcf.gz
11:55:41 - [INFO ] - loaded somatic variants(227 fitting=0) from test_data/COLO829T/pave/COLO829T.sage.pave.vcf.gz
11:55:41 - [INFO ] - sample gender is male
11:55:41 - [INFO ] - applying segmentation
11:55:41 - [INFO ] - merging reference and tumor ratio break points
11:55:42 - [INFO ] - purple output directory: test_data/COLO829T/purple/
11:55:42 - [INFO ] - fitting purity
11:55:43 - [INFO ] - maxDiploidProportion(0.000) diploidCandidates(93) purityRange(0.080 - 1.000) hasTumor(true)
11:55:43 - [INFO ] - calculating copy number
11:55:43 - [INFO ] - modelling somatic peaks
11:55:43 - [INFO ] - enriching somatic variants
11:55:43 - [INFO ] - load(2.0 tml=34.6244) msiIndels(0 perMb=0.0000) burden(2.0 perMb=1.7312)
11:55:43 - [INFO ] - generating QC Stats
11:55:43 - [INFO ] - generating charts
11:55:43 - [INFO ] - Generating COLO829T.input.png via command: tools_dir/circos/bin/circos -nosvg -conf /workspace/projects/hartwig/hmftools/pipeline/test_data/COLO829T/purple/circos/COLO829T.input.conf -outputdir /workspace/projects/hartwig/hmftools/pipeline/test_data/COLO829T/purple/plot -outputfile COLO829T.input.png
11:55:43 - [INFO ] - Generating COLO829T.circos.png via command: tools_dir/circos/bin/circos -nosvg -conf /workspace/projects/hartwig/hmftools/pipeline/test_data/COLO829T/purple/circos/COLO829T.circos.conf -outputdir /workspace/projects/hartwig/hmftools/pipeline/test_data/COLO829T/purple/plot -outputfile COLO829T.circos.png
11:55:43 - [INFO ] - Executing R script via command: Rscript /tmp/script5333308290340212674.R COLO829T test_data/COLO829T/purple/ test_data/COLO829T/purple//plot
11:55:43 - [INFO ] - Executing R script via command: Rscript /tmp/script8396032598467817776.R COLO829T test_data/COLO829T/purple/ test_data/COLO829T/purple//plot
11:55:46 - [FATAL] - Error executing R script. Examine error file /tmp/copyNumberPlots.R15471386380511817827.error for details.
11:55:47 - [ERROR] - Fatal error creating circos plot. Examine error file test_data/COLO829T/purple/circos/COLO829T.circos.png.error for details.
11:55:47 - [WARN ] - Error generating charts
11:56:00 - [INFO ] - generating drivers
11:56:00 - [INFO ] - Purple complete, runTime(22.7s)

by inspecting the tmp/copyNumberPlots.R15471386380511817827.error I get this:

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Error in `mutate()`:
! Problem while computing `bucket = ceiling(copyNumber)`.
Caused by error in `ceiling()`:
! non-numeric argument to mathematical function
Backtrace:
     ▆
  1. ├─global copynumber_pdf(copyNumbers)
  2. │ └─... %>% pull(bucket)
  3. ├─dplyr::pull(., bucket)
  4. ├─dplyr::filter(., row_number() == 1)
  5. ├─dplyr::filter(., proportion > 0.9)
  6. ├─dplyr::arrange(., proportion)
  7. ├─dplyr::mutate(., cumn = cumsum(n), proportion = cumn/max(cumn))
  8. ├─dplyr::summarise(., n = sum(bafCount))
  9. ├─dplyr::group_by(., bucket)
 10. ├─dplyr::mutate(., bucket = ceiling(copyNumber))
 11. ├─dplyr:::mutate.data.frame(., bucket = ceiling(copyNumber))
 12. │ └─dplyr:::mutate_cols(.data, dplyr_quosures(...), caller_env = caller_env())
 13. │   ├─base::withCallingHandlers(...)
 14. │   └─mask$eval_all_mutate(quo)
 15. └─base::.handleSimpleError(...)
 16.   └─dplyr (local) h(simpleError(msg, call))
 17.     └─rlang::abort(...)
Execution halted

Then, the next step in the pipeline is Linx and I am not sure if it's related to the previous error or not, but I get this Exception:

Running Linx somatic with args: -sample COLO829T   -sv_vcf test_data/COLO829T/purple/COLO829T.purple.sv.vcf.gz   -purple_dir test_data/COLO829T/purple   -ref_genome_version V37   -fragile_site_file ref_data_dir/37/PANEL/sv/fragile_sites_hmf.37.csv   -line_element_file ref_data_dir/37/PANEL/sv/line_elements.37.csv   -ensembl_data_dir ref_data_dir/37/PANEL/common/ensembl_data/   -check_fusions -known_fusion_file ref_data_dir/37/PANEL/sv/known_fusion_data.37.csv   -check_drivers -driver_gene_panel ref_data_dir/37/PANEL/common/DriverGenePanel.37.tsv   -output_dir test_data/COLO829T/linx
12:28:28 - [INFO ] - LINX version: 1.22
12:28:28 - [INFO ] - running SV analysis for COLO829T
12:28:28 - [INFO ] - loaded known fusion data: EXON_DEL_DUP(14), PROMISCUOUS_3(33), IG_PROMISCUOUS(2), KNOWN_PAIR(410), PROMISCUOUS_5(31), IG_KNOWN_PAIR(20)
12:28:30 - [INFO ] - loaded 20 known fragile sites from file: ref_data_dir/37/PANEL/sv/fragile_sites_hmf.37.csv
12:28:30 - [INFO ] - loaded 124 known line elements from file: ref_data_dir/37/PANEL/sv/line_elements.37.csv
12:28:30 - [INFO ] - loaded 2 SV data records from VCF file: test_data/COLO829T/purple/COLO829T.purple.sv.vcf.gz
Exception in thread "main" java.lang.NumberFormatException: For input string: "∞"
    at java.base/jdk.internal.math.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2054)
    at java.base/jdk.internal.math.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
    at java.base/java.lang.Double.parseDouble(Double.java:651)
    at com.hartwig.hmftools.common.purple.PurpleCopyNumberFile.fromLines(PurpleCopyNumberFile.java:93)
    at com.hartwig.hmftools.common.purple.PurpleCopyNumberFile.read(PurpleCopyNumberFile.java:44)
    at com.hartwig.hmftools.linx.cn.CnDataLoader.loadCopyNumberData(CnDataLoader.java:122)
    at com.hartwig.hmftools.linx.cn.CnDataLoader.loadSampleData(CnDataLoader.java:91)
    at com.hartwig.hmftools.linx.SampleAnalyser.processSample(SampleAnalyser.java:218)
    at com.hartwig.hmftools.linx.SampleAnalyser.processSamples(SampleAnalyser.java:170)
    at com.hartwig.hmftools.linx.LinxApplication.<init>(LinxApplication.java:209)
    at com.hartwig.hmftools.linx.LinxApplication.main(LinxApplication.java:278)
charlesshale commented 1 year ago

Hi Frederica,

From the Linx log it appears that one or more of the Purple files have produced an INF in the data, and this is causing the R plotting script and Linx to fail.

Are you able to share the Purple input files from this run? Then I can run Purple in debug to see how it is failing in that R script.

Email me directly if preferable: c.shale@hartwigmedicalfoundation.nl

thanks,

Charles

FedericaBrando commented 1 year ago

Hi @charlesshale, thanks for the answer. I am using COLO829T and COLO829T bam files as input (the one that are in the github repo for running the demo).

I'm running the pipeline in panel mode and it fails on purple step. This is the command that I am using:

scripts/run_pipeline scripts test_data ref_data_dir/37/PANEL tools_dir "COLO829T,COLO829R" V37 PANEL 8 64 
charlesshale commented 1 year ago

I ran our demo pipeline just now with our current pipeline (v5.32) on the COLO829 mini BAMs and it produced valid results for all components, including Purple and Purple plots.

Could you try this again if still applicable, and ensure that it includes Cobalt v1.14 which better handles regions with low coverage.

I think any issue here is with the COLO mini BAMs, and not the components or the pipeline itself. We have run many panel samples (TSO500 and HMF OncoPanel) through this pipeline now, and compared the results to WGS output and it is as expected.