lima1 / PureCN

Copy number calling and variant classification using targeted short read sequencing
https://bioconductor.org/packages/devel/bioc/html/PureCN.html
Artistic License 2.0
125 stars 32 forks source link

PureCN-Steps #370

Open ElhamJa63 opened 4 weeks ago

ElhamJa63 commented 4 weeks ago

Hi, I am performing PureCn as an entry-level bioinformatician while matched normal is not available to estimate tumour purity.

I have my own tumour and normal samples bam and vcf files. Also my fatsa format hg19 as a reference genome.

I know it takes your time, But I would be appreciative if you could support and guide me.

I was wondering if,

1.We need generating interval file from baits BED files while our vcf files and bam files are ready?

2.we should focus on Run PureCN with third-party segmentation and ignore Run with internal segmentation since we have prepared our files through GATK?

If we need generating interval file from a BED, I got this conclusion that we need these files. Could you please do me a favor and confirm I am on a right track?

1.baits_hg19.bed (BED file containing bait coordinates for hg19 specific to our capture kit)). How can I access this file? Download or prepare it from our own fasta or bam files?

2.hg19.fa in fasta format . (Should I use the fasta file that I have or download from this package?)

"The --genome version is needed to annotate exons with gene symbols. Use hg19/hg38 for human genomes, not b37/b38. You might get a warning that an annotation package is missing. For hg19, install TxDb.Hsapiens.UCSC.hg19.knownGene in R."

3.Mappability File: provides mappability scores for 100-mers. Download at wgEncodeCrgMapabilityAlign100mer.bigWig from the UCSC Genome Browser.

4.Replication timing file: download at wgEncodeUwRepliSeqK562WaveSignalRep1.bigWig from the UCSC Genome website.

Thank you for your support