nygenome / Conpair

Concordance and contamination estimator for tumor–normal pairs
Other
53 stars 27 forks source link

Input files reads and reference have incompatible contigs. #6

Closed stevekm closed 6 years ago

stevekm commented 6 years ago

I followed the instructions as specified in the README for setting up the human_g1k_v37.fa reference files. However, when I tried to run the program, I got this message:

 $ Conpair/scripts/run_gatk_pileup_for_sample.py -B tumor.bam -O TUMOR_pileup

...
...
...

##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A USER ERROR has occurred (version 3.8-0-ge9d806836):
##### ERROR
##### ERROR This means that one or more arguments or inputs in your command are incorrect.
##### ERROR The error message below tells you what is the problem.
##### ERROR
##### ERROR If the problem is an invalid argument, please check the online documentation guide
##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
##### ERROR
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions https://software.broadinstitute.org/gatk
##### ERROR
##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
##### ERROR
##### ERROR MESSAGE: Input files reads and reference have incompatible contigs. Please see https://software.broadinstitute.org/gatk/documentation/article?id=63for more information. Error details: No overlapping contigs found.
##### ERROR   reads contigs = [chrM, chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY]
##### ERROR   reference contigs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, X, Y, MT, GL000207.1, GL000226.1, GL000229.1, GL000231.1, GL000210.1, GL000239.1, GL000235.1, GL000201.1, GL000247.1, GL000245.1, GL000197.1, GL000203.1, GL000246.1, GL000249.1, GL000196.1, GL000248.1, GL000244.1, GL000238.1, GL000202.1, GL000234.1, GL000232.1, GL000206.1, GL000240.1, GL000236.1, GL000241.1, GL000243.1, GL000242.1, GL000230.1, GL000237.1, GL000233.1, GL000204.1, GL000198.1, GL000208.1, GL000191.1, GL000227.1, GL000228.1, GL000214.1, GL000221.1, GL000209.1, GL000218.1, GL000220.1, GL000213.1, GL000211.1, GL000199.1, GL000217.1, GL000216.1, GL000215.1, GL000205.1, GL000219.1, GL000224.1, GL000223.1, GL000195.1, GL000212.1, GL000222.1, GL000200.1, GL000193.1, GL000194.1, GL000225.1, GL000192.1]
##### ERROR ------------------------------------------------------------------------------------------

As the error message describes, the contigs in my sample .bam file are all "chr1", "chr2", etc., while those in the reference file are instead labeled as "1", "2", etc. I aligned my .bam file against the UCSC hg19 genome.fa file. Using GATK version 3.8.

Is there a recommended solution for this? Or can I just rename the contigs in the reference file to match the ones in my .bam file?

ewabergmann commented 6 years ago

Hi stevekm,

If aligning your reads using the Ensembl reference genome is not an option, I would suggest the following:

  1. add a 'chr' prefix to every line in Conpair/data/markers/GRCh37.autosomes.phase3_shapeit2_mvncall_integrated.20130502.SNV.genotype.sselect_v4_MAF_0.4_LD_0.8.bed and save it in: Conpair/data/markers/GRCh37.autosomes.phase3_shapeit2_mvncall_integrated.20130502.SNV.genotype.sselect_v4_MAF_0.4_LD_0.8_chr.bed

for example: sed 's/(.*)/chr\1/' Conpair/data/markers/GRCh37.autosomes.phase3_shapeit2_mvncall_integrated.20130502.SNV.genotype.sselect_v4_MAF_0.4_LD_0.8.bed > Conpair/data/markers/GRCh37.autosomes.phase3_shapeit2_mvncall_integrated.20130502.SNV.genotype.sselect_v4_MAF_0.4_LD_0.8_chr.bed

  1. Make sure the reference genome you used to map reads have .dict and .fa.fai files in the same directory

  2. Run: $ Conpair/scripts/run_gatk_pileup_for_sample.py -B tumor.bam -O TUMOR_pileup -M Conpair/data/markers/GRCh37.autosomes.phase3_shapeit2_mvncall_integrated.20130502.SNV.genotype.sselect_v4_MAF_0.4_LD_0.8_chr.bed --remove_chr_prefix -R

Please let me know if it worked for you.

Best wishes, Ewa

ewabergmann commented 6 years ago

I'm closing this issue, as there have not been any further questions/comments.