lima1 / PureCN

Copy number calling and variant classification using targeted short read sequencing
https://bioconductor.org/packages/devel/bioc/html/PureCN.html
Artistic License 2.0
127 stars 32 forks source link

[PON] The normal.panel.vcf.file contains only a single sample #336

Closed mazzalab closed 9 months ago

mazzalab commented 9 months ago

Following the GATK4 guidelines here https://gatk.broadinstitute.org/hc/en-us/articles/360035531132--How-to-Call-somatic-mutations-using-GATK4-Mutect2#:~:text=The%20three%20steps%20to%20create%20a%20panel%20of%20normals%20are%3A I've made my PON.

I've then tried to import it into PureCN with the command:

Rscript $PURECN/NormalDB.R --out-dir $OUT_REF --normal-panel $NORMAL_PANEL \
    --assay agilent_v6 --genome hg19 --force

where $HORMAL_PANEL is my "pon.vcf.gz" file made above, but I get the following error: FATAL [2023-12-10 22:35:04] The normal.panel.vcf.file contains only a single sample.

Even if the PON contains two and not one sample.

Note 1: I'm running PureCN in a Nextflow pipeline Note 2: I was not able to install PureCN on my custom Docker machine. Hence, I'm using this image (the latest): https://hub.docker.com/r/markusriester/purecn where I'm supposed to find GenomicsDB-R installed and properly working.

Can you suggest how to make the function above working?

Complete Log Command error:

  INFO [2023-12-10 22:35:03] Loading PureCN 2.8.1...
  INFO [2023-12-10 22:35:03] Creating mapping bias database.
  INFO [2023-12-10 22:35:04] Processing variants 1 to 50000...
  FATAL [2023-12-10 22:35:04] The normal.panel.vcf.file contains only a single sample. 

  FATAL [2023-12-10 22:35:04]  

  FATAL [2023-12-10 22:35:04] This is most likely a user error due to invalid input data or 

  FATAL [2023-12-10 22:35:04] parameters (PureCN 2.8.1). 

  Error: The normal.panel.vcf.file contains only a single sample.

  This is most likely a user error due to invalid input data or
  parameters (PureCN 2.8.1).
  In addition: Warning message:
  In .vcf_usertag(map, tag, nm, verbose) :
    ScanVcfParam ‘geno’ fields not found in  header: ‘AD’
  Execution halted
lima1 commented 9 months ago

Yes, the VCF file GATK creates is not sufficient. You use the GenomicsDB though if you installed the genomicsdb R package (see our Dockerfile).

$ Rscript $PURECN/NormalDB.R --out-dir $OUT_REF \
    --coverage-files example_normal_coverages.list \
    --normal-panel $GENOMICSDB-WORKSPACE-PATH/pon_db \
    --genome hg19 \
    --assay agilent_v6
mazzalab commented 9 months ago

Thanks Markus, what I was saying is that I actually used your docker image from dockerhub where I guess GenomicsDB is already installed and working. If it is the case, the only difference between my command line and your is this: --coverage-files example_normal_coverages.list \ may this be the reason of the error?

If you prefer I can describe line-bu-line what I've done

lima1 commented 9 months ago

The output of GATK CreateSomaticPanelOfNormals is not sufficient, so NormalDB.R does its own thing, similar to it. So don't provide that VCF output, provide the actual GenomicsDB directory, i.e. the output of GATK GenomicsDBImport.

mazzalab commented 9 months ago

It works, thanks. I suggest making it somehow clearer in the best practice document.