issues with missing calls and accuracy

humanlongevity / HLA

xHLA: Fast and accurate HLA typing from short read sequence data

Other

101 stars 52 forks source link

issues with missing calls and accuracy #27

Closed cjieming closed 6 years ago

cjieming commented 6 years ago

I ran xHLA on samples from WES data from the 1000 Genomes Project that are also in HapMap, and obtain a massive number of missing calls (~20% for MHC I and 30% for MHC II). The accuracy on the remaining calls is also low.

sample command used: run.py --sample_id NA19137_SRR792560_output --input_bam_path NA19137_SRR792560.recal.bam --output NA19137_SRR792560_results

Any help or comment on how to correctly use xHLA is much appreciated! Since the paper mentioned an accuracy of 99%, our results seem far from that.

Also perhaps an explanation about the output files would be beneficial for troubleshooting.

Thanks!

tanghaibao commented 6 years ago

@cjieming Which reference were you using? Have you tried preprocessing the bam with this script: https://github.com/humanlongevity/HLA/blob/master/bin/get-reads-alt-unmap.sh

Haibao

cjieming commented 6 years ago

Hi @tanghaibao The blurb seems to say that the BAMs would be reprocessed using BWA-MEM in a non-alt-aware fashion. I can use the script to reprocess but not sure if it will make any difference, because I am using hg38, and BWA-MEM as my aligner with default parameters...

cjieming commented 6 years ago

Hi @tanghaibao, I am encountering this error after I ran the script you posted: [bwt_restore_sa] SA-BWT inconsistency: primary is not the same. Abort!

And I have been trying to figure out how to make this work. I surmise that the get-reads-alt-unmap.sh script uses the files in /data/chr6. But that folder does not contain any fasta file. BWA-MEM seems to require that?

My command is: ./get-reads-alt-unmap.sh NA12892_ERR034529.recal.bam NA12892_ERR034529.recal. new.bam

Thanks for your help!

cjieming commented 6 years ago

Anyway I solved the issue after obtaining just fasta for chr6 from GRCh38 and then re-indexing. I did it for one sample, and the accuracy seems good now. I will do it for the rest of my samples and see. I will close this issue for now.