secastel / phaser

phasing and Allele Specific Expression from RNA-seq
GNU General Public License v3.0
105 stars 36 forks source link

FATAL ERROR: No heterozygous sites that passed all filters were included in the analysis #64

Closed deb0612 closed 3 years ago

deb0612 commented 3 years ago

Dear sir, I encountered a fatal error when running the following commend:

$ python ~/phaser/phaser/phaser.py --vcf RNA-Control-007A.raw.snps.indels.vcf.gz --bam sample/Samtools_sort_on_RNA-Control-007A.bam --paired_end 1 --mapq 255 --baseq 10 --sample sample1 --o phaser_test_case --id_separator ' '

################################################## Welcome to phASER v1.1.1 Author: Stephane Castel (scastel@nygenome.org) Updated by: Bishwa K. Giri (bkgiri@uncg.edu) ##################################################

Completed the check of dependencies and input files availability...

STARTED "Read backed phasing and ASE/haplotype analyses" ... DATE, TIME : 2020-08-24, 11:25:18

1. Loading heterozygous variants into intervals...

Processing sample named sample1 using all the chromosomes ... processing VCF...

Memory efficient mode is deactivated...
If RAM is limited, activate memory efficient mode using the flag "--process_slow = 1"...

 creating variant mapping table...
      0 heterozygous sites being used for phasing (408333 filtered, 0 indels excluded, 0 unphased)

 FATAL ERROR: No heterozygous sites that passed all filters were included in the analysis, phASER cannot continue. Check blacklist and pass_only arguments.
everestial commented 3 years ago

It would help if you provided a small sample of your VCF and bam file. :)

deb0612 commented 3 years ago

Dear sir, Thanks for your reply. Please use the following link to get the test files you need. vcf: https://www.dropbox.com/s/qa7o284mlbnxqer/0825test.vcf bam&bai https://www.dropbox.com/s/jqcpq68narzg0fb/test_bam.rar

Thanks & Regards Debbie

everestial notifications@github.com 於 2020年8月25日 週二 上午12:06寫道:

It would help if you provided a small sample of your VCF and bam file. :)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/secastel/phaser/issues/64#issuecomment-679219104, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG5RNPVKSIQROEWAA3STXUTSCKFYJANCNFSM4QJBIE4Q .

secastel commented 3 years ago

Hi Debbie, Thanks for providing the sample files (and thanks @everestial for suggesting they do so). From the sample VCF it seems that there are only homozygous variant calls. If this is the case for the full VCF, it would explain the error you are seeing. phASER is designed to phase and produce allelic counts for heterozygous variants. Can you confirm that the full VCF contains only homozygous variant calls?

Best, Stephane

deb0612 commented 3 years ago

Dear Stephane, Thanks for your reply. I had checked my full VCF. I found it contain both homologous(AC=2) and heterogeneous (AC=1) variants. Could I just simply select variants with INFO AC=1 from VCF and ran phASER again ?

Debbie

Stephane Castel notifications@github.com 於 2020年8月27日 週四 上午11:11寫道:

Hi Debbie, Thanks for providing the sample files (and thanks @everestial https://github.com/everestial for suggesting they do so). From the sample VCF it seems that there are only homozygous variant calls. If this is the case for the full VCF, it would explain the error you are seeing. phASER is designed to phase and produce allelic counts for heterozygous variants. Can you confirm that the full VCF contains only homozygous variant calls?

Best, Stephane

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/secastel/phaser/issues/64#issuecomment-681317590, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG5RNPWMCWEDTVJVOUJ2Y7DSCXFH5ANCNFSM4QJBIE4Q .

secastel commented 3 years ago

Hi Debbie, You are right, I'm sorry, even the sample VCF you sent contains a few heterozygous positions that I didn't see. Can you try running phASER removing the --id_separator ' ' argument that you added?

deb0612 commented 3 years ago

Dear sir, I tried again and find another error message: ################################################## Welcome to phASER v1.1.1 Author: Stephane Castel (scastel@nygenome.org) Updated by: Bishwa K. Giri (bkgiri@uncg.edu) ##################################################

Completed the check of dependencies and input files availability...

STARTED "Read backed phasing and ASE/haplotype analyses" ... DATE, TIME : 2020-08-28, 17:20:17

1. Loading heterozygous variants into intervals...

Processing sample named sample1 using all the chromosomes ... processing VCF... Memory efficient mode is deactivated... If RAM is limited, activate memory efficient mode using the flag "--processslow = 1"... creating variant mapping table... **FATAL ERROR: Character '' must not be present in contig name. Please change id separtor using --id_separator to a character not found in the contig names and try again.**

everestial commented 3 years ago

@deb0612 Please include the command you used.

From what I see, your contig name already contains "_". So, "_" (which I think gets used by default) cannot be used to when creating unique IDs. You will have to use some thing other than "_", ":". @secastel can confirm to this.

Please share your command to confirm this.

everestial commented 3 years ago

Hi Debbie, Thanks for providing the sample files (and thanks @everestial for suggesting they do so). From the sample VCF it seems that there are only homozygous variant calls. If this is the case for the full VCF, it would explain the error you are seeing. phASER is designed to phase and produce allelic counts for heterozygous variants. Can you confirm that the full VCF contains only homozygous variant calls?

Best, Stephane

yw

secastel commented 3 years ago

Yes @everestial is right. You need to use a different separator for generating IDs. Try running with the argument: --id_separator '!' Ana let us know how that works.

deb0612 commented 3 years ago

I got the error message like this:

FATAL ERROR: Character ':' must not be present in contig name. Please change id separtor using --id_separator to a character not found in the contig names and try again.

secastel commented 3 years ago

I'm assuming you used '!' as the id_separator as I suggested above. If so, then try another character, like '#', or really any character that is not in the contig (chromosome names). The issue is that phaser generates a unique ID key for each variant, like this "chr5:2452352:C:T". If the separator used when generating the key is found in the contig name then you get an issue because the key looks like the following: "chr:5: 2452352:C:T". So as I said, please use an id_separator that is not found in any of the contig names.

deb0612 commented 3 years ago

Dear sir, I had tried almost every character in the keyboard, and still show the same error. Image 016

secastel commented 3 years ago

I see you've closed the thread, but I think I identified the cause of the issue you are having. It seems that phASER doesn't allow any contigs to have ":" in them, even if you specify another character using --id_separator. I'm not sure if this is a bug, or was done for a specific purpose. If you're still interested in trying to fix it, can you run the following command on your input VCF and let me know what the output is?

zcat input.vcf.gz | grep -v "^#" | cut -f 1 | sort -u

You can also try running using the attached updated phaser.py file, which will allow phaser to run when you have provided another id_seperator, so you could also try that with something other than ":" specific (as you did above) and see if that works.

phaser.py.zip

deb0612 commented 3 years ago

test.txt I had tried 'zcat input.vcf.gz | grep -v "^#" | cut -f 1 | sort -u', and got the output in the attached file.

secastel commented 3 years ago

Well, I don't see any ":" in the list of contigs names here, which is very confusing. Also, looking at the VCF you previously uploaded to dropbox, it seems the contigs were named e.g. "chr1" in that file, but from the output above they are named "1", so that is inconsistent. At this point I'm really at a loss. The only thing I could try is if you provide the exact VCF you used as input, I can try to run everything myself. I already have the BAM downloaded that you provided before. It's up to you, but without that I don't think there is much more we can do at this point.

deb0612 commented 3 years ago

Dear Stephane, control.007A.selected2.vcf https://drive.google.com/file/d/1SxcY0oTthcSjRT4eOgg7glA4kQvyeXQe/view?usp=drive_web I would like to provide my vcf file. Please just use it privately. The command I used: python ~/tools/phaser/phaser/phaser.py --vcf control.007A.selected2.vcf.gz --bam sample/Samtools_sort_on_RNA-Control-007A.bam --paired_end 1 --mapq 255 --baseq 10 --sample sample1 --o phaser_test_case --id_separator "@"

Thanks

Debbie

Stephane Castel notifications@github.com 於 2020年9月4日 週五 上午11:37寫道:

Well, I don't see any ":" in the list of contigs names here, which is very confusing. Also, looking at the VCF you previously uploaded to dropbox, it seems the contigs were named e.g. "chr1" in that file, but from the output above they are named "1", so that is inconsistent. At this point I'm really at a loss. The only thing I could try is if you provide the exact VCF you used as input, I can try to run everything myself. I already have the BAM downloaded that you provided before. It's up to you, but without that I don't think there is much more we can do at this point.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/secastel/phaser/issues/64#issuecomment-686882939, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG5RNPS3VS4GSVFO2FW7EDTSEBOFZANCNFSM4QJBIE4Q .

deb0612 commented 3 years ago

I use the new phaser.py.zip script you provided, and it runs well! I found that I also need to revise the INFO column in vcf file to "PASS", or I will encounter the same heterozygous error. However, when go to step3, there is another problem like this

2. Retrieving reads that overlap heterozygous sites...

 file: sample/Samtools_sort_on_RNA-Control-007A.bam
      minimum mapq: 255
      mapping reads to variants...
           completed chromosome chr10...
           completed chromosome chr11...
           completed chromosome chr12...
           completed chromosome chr13...
           completed chromosome chr14...
           completed chromosome chr15...
           completed chromosome chr16...
           completed chromosome chr17...
           completed chromosome chr18...
           completed chromosome chr19...
           completed chromosome chr1...
           completed chromosome chr20...
           completed chromosome chr21...
           completed chromosome chr22...
           completed chromosome chr2...
           completed chromosome chr3...
           completed chromosome chr4...
           completed chromosome chr5...
           completed chromosome chr6...
           completed chromosome chr7...
           completed chromosome chr8...
           completed chromosome chr9...
           completed chromosome chrM...
           completed chromosome chrX...
           completed chromosome chrY...
      processing mapped reads...
      no alignment score value found in reads, cannot use cutoff
      retrieved 0 reads

3. Identifying connected variants...

 calculating sequencing noise level...
 **FATAL ERROR: No reads could be matched to variants. Please double check your settings and input files. Common reasons for this occurring include: 1) MAPQ or BASEQ set too conservatively 2) BAM and VCF have different chromosome names (IE 'chr1' vs '1').**

Even though I change chr1 to 1, error continued. However, when I lower the "--mapq" from 255 to 25, it worked fine. Will they be any problem, I change this value so low?

secastel commented 3 years ago

Sorry for the delayed response. Glad to hear that the modified script works for you. I'll look into pushing those changes into the production version. As for the --mapq setting, that refers to mapping quality cutoff to use for aligned reads. This depends on the alignment software you used. If you used STAR, then a mapq of 255 should be used, which corresponds to uniquely aligning reads. If you used another alignment tool, whatever mapq that corresponds to uniquely (or highly uniquely) mapping reads should be used.

deb0612 commented 3 years ago

Dear Stephane, Thanks for your reply. I had lower my mapq value to 50, and it looks fine. However, there is other question appeared in the output file when annotated by phaser_gene_ae.py. Please refer to https://github.com/secastel/phaser/issues/65

Debbie

Stephane Castel notifications@github.com 於 2020年9月10日 週四 下午12:27寫道:

Sorry for the delayed response. Glad to hear that the modified script works for you. I'll look into pushing those changes into the production version. As for the --mapq setting, that refers to mapping quality cutoff to use for aligned reads. This depends on the alignment software you used. If you used STAR, then a mapq of 255 should be used, which corresponds to uniquely aligning reads. If you used another alignment tool, whatever mapq that corresponds to uniquely (or highly uniquely) mapping reads should be used.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/secastel/phaser/issues/64#issuecomment-689972138, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG5RNPXKANIMHOJLD4ECAZLSFBIURANCNFSM4QJBIE4Q .