lima1 / PureCN

Copy number calling and variant classification using targeted short read sequencing
https://bioconductor.org/packages/devel/bioc/html/PureCN.html
Artistic License 2.0
127 stars 32 forks source link

'long flag "seg-file" is invalid' #322

Closed h170607 closed 1 year ago

h170607 commented 1 year ago

Hi, I Run PureCN with CNVkit segmentation: cnvkit.py export seg ${SAMPLE}_cnvkit.cns --enumerate-chroms -o ${SAMPLE}_cnvkit.seg

`Rscript /data/envsoft/R-4.0.0/lib64/R/library/PureCN/extdata/PureCN.R --out ./${SAMPLE} \

--sampleid $SAMPLE \

--tumor ${SAMPLE}_cnvkit.cnr \

--seg-file ${SAMPLE}_cnvkit.seg \

--vcf ${SAMPLE}_mutect.vcf \

--snp-blacklist hg19_simpleRepeats.bed \

--genome hg19 \

--fun-segmentation Hclust \

--force --post-optimize --seed 123`

The VCF contains matched normal information. Then got the error, with no results: /data/envsoft/R-4.0.0/lib64/R/library/PureCN/extdata/PureCN.R: error: Error in getopt(spec = spec, opt = args) : long flag "seg-file" is invalid

lima1 commented 1 year ago

That's weird. What's the output of Rscript /data/envsoft/R-4.0.0/lib64/R/library/PureCN/extdata/PureCN.R --help ?

h170607 commented 1 year ago

There seems to be some problem: /data/envsoft/R-4.0.0/bin/Rscript /data/envsoft/R-4.0.0/lib64/R/library/PureCN/extdata/PureCN.R --help

`Possible Ensembl SSL connectivity problems detected. Please see the 'Connection Troubleshooting' section of the biomaRt vignette vignette('accessing_ensembl', package = 'biomaRt')Error in curl::curl_fetch_memory(url, handle = handle) : Peer certificate cannot be authenticated with given CA certificates: [uswest.ensembl.org] Peer's Certificate has expired.

Usage: /data/envsoft/R-4.0.0/lib64/R/library/PureCN/extdata/PureCN.R [options] Options: -i SAMPLEID, --sampleid=SAMPLEID Sample id --normal=NORMAL Input: normal coverage, GC-normalized. Optional if normaldb or segfile is provided. --tumor=TUMOR Input: tumor coverage, GC-normalized --vcf=VCF Input: VCF file --rds=RDS Input: PureCN output RDS file, used to regenerate plots and files after manual curation --mappingbiasfile=MAPPINGBIASFILE Input: Mapping bias RDS file, generated by NormalDB.R (or calculateMappingBiasVcf) --normal_panel=NORMAL_PANEL Input: Deprecated, use --mappingbiasfile instead...`

need to reinstall this r package?

lima1 commented 1 year ago

Yes, this looks like an ancient version. Try following the README here to get the most recent version.

h170607 commented 1 year ago

I upgraded the software to the latest version 2.7.11, using BiocManager::install("lima1/PureCN"), and still have the following tips:

Possible Ensembl SSL connectivity problems detected.

Please see the 'Connection Troubleshooting' section of the biomaRt vignette

vignette('accessing_ensembl', package = 'biomaRt')Error in curl::curl_fetch_memory(url, handle = handle) :

Peer certificate cannot be authenticated with given CA certificates: [uswest.ensembl.org] Peer's Certificate has expired.`

I tried running it and still got errors:

WARN [2023-09-15 14:08:22] vcf.file has no DB info field for membership in germline databases. Found and used somatic status instead.

INFO [2023-09-15 14:08:22] 0 (0.0%) variants annotated as likely germline (DB INFO flag).

FATAL [2023-09-15 14:08:22] VCF either contains no germline variants or variants are not properly annotated.

My vcf file was generated by analyzing normal.bam and tumor.bam files with mutect2 and was filtered by the FilterMutectCalls. I don't quite understand why germline mutations are needed. Does this mean using the result of the HaplotypeCaller pipeline?

ddrichel commented 1 year ago

I am currently looking into this as well. Maybe you ran Mutect2 without --germline-resource, e.g. gnomAD? Mutect2 converts AF from germline resource to POPAF=-log10(AF), which PureCN can use for better fitting of copy numbers and estimates of purity/ploidy. PureCN can be run without --vcf if you want to do it without germline mutations, but I expect a loss of performance.

lima1 commented 1 year ago

@h170607, have a look here, you will need to tell Mutect to get germline sites. https://bioconductor.org/packages/devel/bioc/vignettes/PureCN/inst/doc/Quick.html#3_Create_VCF_files

SNPs provide the allele-specific copy number signal, have a look at our or the ABSOLUTE paper.

h170607 commented 1 year ago

We reran Mutect2 with --genotype-germline-sites true, --genotype-pon-sites true and --interval-padding 75, and got the result. It doesn't seem to need gnomAD.

lima1 commented 1 year ago

Yes, if you have a matched normal, it uses that for the germline vs somatic priors.

lima1 commented 1 year ago

If you get lots of variants removed by the BQ < 25 filter, see #320.