sigven / pcgr

Personal Cancer Genome Reporter (PCGR)
https://sigven.github.io/pcgr
MIT License
252 stars 47 forks source link

Error in dplyr:semi_join() when only numeric chromosomes in input #250

Open MareikeJaniak opened 4 weeks ago

MareikeJaniak commented 4 weeks ago

Hi,

Apologies for opening yet another issue, but I've encountered a small error with pcgr. I believe this is related to our test dataset, which is limited to chromosome 19, and the way that R reads in numeric columns. When using the --vcf2maf flag in the command, the following error occurs:

2024-08-13 13:41:40 - pcgr-writer - INFO - PCGR - STEP 6: Generation of output files - molecular interpretation report for precision cancer medicine
export CONDA_PREFIX=/cvmfs/soft.mugqic/root/software/pcgr/pcgr-2.0.3/envs/pcgrr && export PATH=/cvmfs/soft.mugqic/root/software/pcgr/pcgr-2.0.3/envs/pcgrr/bin:"$PATH" && /cvmfs/soft.mugqic/root/software/pcgr/pcgr-2.0.3/envs/pcgrr/bin/Rscript /cvmfs/soft.mugqic/root/software/pcgr/pcgr-2.0.3/envs/pcgr/bin/pcgrr.R /lustre03/project/6007512/C3G/projects/jenkins_tests/GenPipesFull_package_2024-06-13T15.52.42/scriptTestOutputs/rnaseq_cancer/alignment/HCC1395/pcgr/HCC1395.pcgr.grch38.conf.yaml /cvmfs/soft.mugqic/root/software/pcgr/pcgr-2.0.3/envs/pcgrr/etc/conda/activate.d/quarto.sh
2024-08-13 13:41:40 - pcgr-writer - INFO - export CONDA_PREFIX=/cvmfs/soft.mugqic/root/software/pcgr/pcgr-2.0.3/envs/pcgrr && export PATH=/cvmfs/soft.mugqic/root/software/pcgr/pcgr-2.0.3/envs/pcgrr/bin:"$PATH" && /cvmfs/soft.mugqic/root/software/pcgr/pcgr-2.0.3/envs/pcgrr/bin/Rscript /cvmfs/soft.mugqic/root/software/pcgr/pcgr-2.0.3/envs/pcgr/bin/pcgrr.R /lustre03/project/6007512/C3G/projects/jenkins_tests/GenPipesFull_package_2024-06-13T15.52.42/scriptTestOutputs/rnaseq_cancer/alignment/HCC1395/pcgr/HCC1395.pcgr.grch38.conf.yaml /cvmfs/soft.mugqic/root/software/pcgr/pcgr-2.0.3/envs/pcgrr/etc/conda/activate.d/quarto.sh
2024-08-13 13:41:41 - pcgr-report-generation - INFO - Successfully parsed YAML configuration file - reporting mode: PCGR
2024-08-13 13:41:41 - pcgr-report-generation - INFO - Loading reference datasets - genome assembly: grch38
2024-08-13 13:42:24 - pcgr-report-generation - INFO - ------
2024-08-13 13:42:24 - pcgr-report-generation - INFO - Reading annotated molecular dataset (DNA) - somatic SNV/InDels
2024-08-13 13:42:25 - pcgr-report-generation - INFO - Assigning variants to tiers of clinical significance - somatic actionability guidelines (AMP/ASCO/CAP)
2024-08-13 13:42:27 - pcgr-report-generation - INFO - Applying variant filters on tumor-only calls - assigning somatic classification
2024-08-13 13:42:27 - pcgr-report-generation - INFO - Updating MAF file with filtered somatic SNV/InDels
Error in `dplyr::semi_join()`:
! Can't join `x$Chromosome` with `y$Chromosome` due to incompatible
  types.
ℹ `x$Chromosome` is a <double>.
ℹ `y$Chromosome` is a <character>.
Backtrace:
     ▆
  1. ├─pcgrr::generate_report(yaml_fname = yaml_fname)
  2. │ └─pcgrr::load_somatic_snv_indel(...)
  3. │   └─pcgrr::filter_maf_file(callset = callset, settings = settings)
  4. │     ├─dplyr::semi_join(...)
  5. │     └─dplyr:::semi_join.data.frame(...)
  6. │       └─dplyr:::join_filter(...)
  7. │         └─dplyr:::join_cast_common(x_key, y_key, vars, error_call = error_call)
  8. │           ├─rlang::try_fetch(...)
  9. │           │ └─base::withCallingHandlers(...)
 10. │           └─vctrs::vec_ptype2(x, y, x_arg = "", y_arg = "", call = error_call)
 11. ├─vctrs (local) `<fn>`()
 12. │ └─vctrs::vec_default_ptype2(...)
 13. │   ├─base::withRestarts(...)
 14. │   │ └─base (local) withOneRestart(expr, restarts[[1L]])
 15. │   │   └─base (local) doWithOneRestart(return(expr), restart)
 16. │   └─vctrs::stop_incompatible_type(...)
 17. │     └─vctrs:::stop_incompatible(...)
 18. │       └─vctrs:::stop_vctrs(...)
 19. │         └─rlang::abort(message, class = c(class, "vctrs_error"), ..., call = call)
 20. │           └─rlang:::signal_abort(cnd, .file)
 21. │             └─base::signalCondition(cnd)
 22. └─rlang (local) `<fn>`(`<vctrs__2>`)
 23.   └─handlers[[1L]](cnd)
 24.     └─dplyr:::rethrow_error_join_incompatible_type(cnd, vars, error_call)
 25.       └─dplyr:::stop_join(...)
 26.         └─dplyr:::stop_dplyr(...)
 27.           └─rlang::abort(...)
Execution halted

I think this might be happening because the Chromosome column in this case only contains "19", which is leading R to read this column in as numeric, instead of as a character. I realize this is a bit of a niche issue, but others may come across it if their datasets don't contain data on chr X and Y.

I'm using pcgr v2.0.3 with the following command:

pcgr --force_overwrite --vep_buffer_size 500 --vep_no_intergenic --vcf2maf --tumor_site 0 --assay WES --tumor_only --tumor_dp_tag TDP --tumor_af_tag TVAF --tumor_dp_min 10 --tumor_af_min 0.03 --exclude_likely_hom_germline --exclude_likely_het_germline --exclude_dbsnp_nonsomatic --exclude_nonexonic --input_vcf alignment/HCC1395/HCC1395.hc.vt.annot.flt.vcf.gz --refdata_dir $PCGR_DATA --vep_dir $PCGR_VEP_CACHE --output_dir alignment/HCC1395/pcgr --genome_assembly grch38 --sample_id HCC1395 --debug

The same command runs fine with the example data and it also finishes successfully with our test dataset if I omit the --vcf2maf flag. I'm attaching the input file used here. HCC1395.hc.vt.annot.flt.vcf.gz

Best, Mareike

sigven commented 4 weeks ago

Thanks a lot, Mareike! Never be sorry for filing issues, these are very useful. I think you identified what needs to be fixed, I'll just try to reproduce it first, and make a patch for it. And such bugs clearly hints to some more robust testing procedures before moving to release. ;-) . Anyways, I'll make a fix, prob next week, also working on some other upgrades (CNA plot, germline integration etc).

Thanks again!

best, Sigve

sigven commented 13 hours ago

Hi Mareike,

I've re-run your sample with the upcoming version (2.1.0, also with updated reference data):

Results availabe here: https://www.dropbox.com/scl/fo/12ocw7d70by3w9c3fgx37/AExdqvjAVwWTuwp18txMv68?rlkey=5uvhcucz5og98xrwhuqs8nclq&dl=0

Sorry for the delay:)

best, Sigve

MareikeJaniak commented 10 hours ago

Hi Sigve,

Thanks for the update and for all of the updates in the upcoming version! We're looking forward to testing it out.

Best, Mareike