Closed BurtonChia closed 9 years ago
Dear Burton,
That is not a bug – it rather is a feature ;o).
The software does not contain gene annotations built in for the sake of flexibility. It is documented as such on the GitHub page. You will need a file with gene names and coordinates that matches the reference genome that was used for mapping. Principally, this should work for any organism and assembly version, if the files are in the correct format. The documented example for human, hg19 and hg38 using UCSC data:
A file with gene names and coordinates is required if clusters should be named using gene names. For human assemblies GRCh37 (hg19) and GRCh38 (hg38), these file can be downloaded from UCSC:
- refGene.txt.gz assembly hg19 (5 MB)
- refGene.txt.gz assembly hg38 (5 MB)
And later in the example to run the software, make sure to use the --names
parameter during clustering (passed-through clustering parameter --cp names
in the single step command line) to point to the (unzipped) file:
Running
You can run all steps separately:
# extract counts $ qgetcounts -i M62_NA13019.bam -a CNA902Y.bed > M62_NA13019.tsv $ qgetcounts -i M62_NA12878.bam -a CNA902Y.bed > M62_NA12878.tsv # cluster the counts $ qcluster -i M62_NA13019.tsv [--names refGene.txt] > M62_NA13019.clustered $ qcluster -i M62_NA12878.tsv [--names refGene.txt] > M62_NA12878.clustered # call copy numbers $ quandico --no-cluster \ -s data=M62_NA13019.clustered \ # file with clustered counts -r data=M62_NA12878.clustered \ # file with clustered counts -s x=2 -s y=0 -r x=2 -r y=0 # sample (-s) and reference (-r) are female
Alternatively, all this can be done using one single command:
$ quandico -s map=M62_NA13019.bam -s x=2 -s y=0 \ # sample -r map=M62_NA12878.bam -r x=2 -r y=0 \ # reference -a CNA902Y.bed \ # amplicons -d results -b 13019_vs_12878 \ # output location and name [--cp names=refGene.txt] # optional cluster names
That should solve your problem.
Best regards, Frank
Hi,
I'm having some problems with the output of Quandico, both the pdf and VCF are missing the gene names, they simply appear as chr1.1, chrX.3, etc.
I've tested quandico on the example dataset provided on the Git page and this problem still persists. I ran the example with this command line "quandico -s map=M62_NA13019.bam -s x=2 -s y=0 -r map=M62_NA12878.bam -r x=2 -r y=0 -A CNA902Y.bed -d results -b 13019_vs_12878"
Anyone else has encountered this issue?
Regards Burton