reineckef / quandico

Quantitative analysis of differences in copy numbers using read depth obtained from PCR-enriched samples and controls
GNU General Public License v3.0
3 stars 2 forks source link

Missing gene name in PDF and VCF #1

Closed BurtonChia closed 9 years ago

BurtonChia commented 9 years ago

Hi,

I'm having some problems with the output of Quandico, both the pdf and VCF are missing the gene names, they simply appear as chr1.1, chrX.3, etc.

I've tested quandico on the example dataset provided on the Git page and this problem still persists. I ran the example with this command line "quandico -s map=M62_NA13019.bam -s x=2 -s y=0 -r map=M62_NA12878.bam -r x=2 -r y=0 -A CNA902Y.bed -d results -b 13019_vs_12878"

Anyone else has encountered this issue?

Regards Burton

reineckef commented 9 years ago

Dear Burton,

That is not a bug – it rather is a feature ;o).

The software does not contain gene annotations built in for the sake of flexibility. It is documented as such on the GitHub page. You will need a file with gene names and coordinates that matches the reference genome that was used for mapping. Principally, this should work for any organism and assembly version, if the files are in the correct format. The documented example for human, hg19 and hg38 using UCSC data:

A file with gene names and coordinates is required if clusters should be named using gene names. For human assemblies GRCh37 (hg19) and GRCh38 (hg38), these file can be downloaded from UCSC:

And later in the example to run the software, make sure to use the --names parameter during clustering (passed-through clustering parameter --cp names in the single step command line) to point to the (unzipped) file:

Running

You can run all steps separately:

# extract counts
$ qgetcounts -i M62_NA13019.bam -a CNA902Y.bed > M62_NA13019.tsv
$ qgetcounts -i M62_NA12878.bam -a CNA902Y.bed > M62_NA12878.tsv

# cluster the counts
$ qcluster -i M62_NA13019.tsv [--names refGene.txt] > M62_NA13019.clustered
$ qcluster -i M62_NA12878.tsv [--names refGene.txt] > M62_NA12878.clustered

# call copy numbers
$ quandico --no-cluster \
  -s data=M62_NA13019.clustered \   # file with clustered counts
  -r data=M62_NA12878.clustered \   # file with clustered counts
   -s x=2 -s y=0 -r x=2 -r y=0      # sample (-s) and reference (-r) are female

Alternatively, all this can be done using one single command:

$ quandico -s map=M62_NA13019.bam -s x=2 -s y=0 \ # sample
           -r map=M62_NA12878.bam -r x=2 -r y=0 \ # reference
           -a CNA902Y.bed                       \ # amplicons
           -d results -b 13019_vs_12878         \ # output location and name
           [--cp names=refGene.txt]               # optional cluster names

That should solve your problem.

Best regards, Frank