Missing gene name in PDF and VCF

Dear Burton,

That is not a bug – it rather is a feature ;o).

The software does not contain gene annotations built in for the sake of flexibility. It is documented as such on the GitHub page. You will need a file with gene names and coordinates that matches the reference genome that was used for mapping. Principally, this should work for any organism and assembly version, if the files are in the correct format. The documented example for human, hg19 and hg38 using UCSC data:

A file with gene names and coordinates is required if clusters should be named using gene names. For human assemblies GRCh37 (hg19) and GRCh38 (hg38), these file can be downloaded from UCSC:

refGene.txt.gz assembly hg19 (5 MB)

refGene.txt.gz assembly hg38 (5 MB)

And later in the example to run the software, make sure to use the --names parameter during clustering (passed-through clustering parameter --cp names in the single step command line) to point to the (unzipped) file:

Running

You can run all steps separately:

# extract counts
$ qgetcounts -i M62_NA13019.bam -a CNA902Y.bed > M62_NA13019.tsv
$ qgetcounts -i M62_NA12878.bam -a CNA902Y.bed > M62_NA12878.tsv

# cluster the counts
$ qcluster -i M62_NA13019.tsv [--names refGene.txt] > M62_NA13019.clustered
$ qcluster -i M62_NA12878.tsv [--names refGene.txt] > M62_NA12878.clustered

# call copy numbers
$ quandico --no-cluster \
  -s data=M62_NA13019.clustered \   # file with clustered counts
  -r data=M62_NA12878.clustered \   # file with clustered counts
   -s x=2 -s y=0 -r x=2 -r y=0      # sample (-s) and reference (-r) are female

Alternatively, all this can be done using one single command:

$ quandico -s map=M62_NA13019.bam -s x=2 -s y=0 \ # sample
           -r map=M62_NA12878.bam -r x=2 -r y=0 \ # reference
           -a CNA902Y.bed                       \ # amplicons
           -d results -b 13019_vs_12878         \ # output location and name
           [--cp names=refGene.txt]               # optional cluster names

That should solve your problem.

Best regards, Frank

reineckef / quandico

Missing gene name in PDF and VCF #1

Running