secastel / phaser

phasing and Allele Specific Expression from RNA-seq
GNU General Public License v3.0
111 stars 36 forks source link

problem when running phaser #37

Closed jfertaj closed 7 years ago

jfertaj commented 7 years ago

Dear Stephane.

I am trying to run phaser on my server (they made the installation), but unfortunately I always get this error: /bin/sh: python2.7: command not found /bin/sh: python2.7: command not found /bin/sh: python2.7: command not found /bin/sh: python2.7: command not found [main_samview] failed to write the SAM header samtools view: error closing standard output: -1 [main_samview] failed to write the SAM header samtools view: error closing standard output: -1 [main_samview] failed to write the SAM header samtools view: error closing standard output: -1samtools view: writing to standard output failed : Broken pipe

I have tried setting-up an alias for python2.7 but it doesn't work, any hints?

Thanks

secastel commented 7 years ago

Sorry to hear about the problem. This error is occurring because the main phASER script is trying to call python and it doesn't seem to be accessible using "python2.7".

You should provide the "--python_string" argument when running phaser, and set it to whatever you use to run python on your server.

jfertaj commented 7 years ago

Thanks a lot @secastel, I now have been able to run it, however I got another error about "not reads could be matched to variants". Both BAM and vcf have chr as chromosome nomenclature. The vcf file is generated from Michigan imputation server, phased with ShapeIT2 and the genotypes are dosages, and contains several samples. The BAM file is generated after merging different bams from same individual. Also says that I don't have alignment score in my reads and states "retrieved 0 reads", any help would be greatly appreciated

secastel commented 7 years ago

Can you post the command you are using to run phASER please?

jfertaj commented 7 years ago

here it is:

python2.7 ~/phaser/phaser/phaser.py \ --vcf ~/Ceres/HRC_files/ceres.HRC_imputed_MAF0.01_missing0.05_INFO0.3.reheader_addchr.vcf.gz \ --bam /home/jfertaj/Ceres_rnaseq_151017/Juan/merged/bam/INTP_1031.bam \ --paired_end 1 \ --mapq 255 \ --baseq 10 \ --sample eQTL_103 \ --blacklist hg19_hla.chr.bed \ --haplo_count_blacklist hg19_haplo_count_blacklist.chr.bed \ --threads 14 \ --o phaser_test_case

secastel commented 7 years ago

Was STAR used to align the RNA-seq reads?

jfertaj commented 7 years ago

Yes, I have used STAR as aligner

secastel commented 7 years ago

Just to be sure, in the VCF chromosomes are listed as e.g. "chr1"? If possible, it would be helpful if you could upload just the first e.g. 100 lines of both the VCF and the BAM (converted to SAM), including their respective headers.

jfertaj commented 7 years ago

Yes, I just re-check and in the vcf, chromosomes are listed with chr prefix. I have uploaded a vcf for one sample with no header and also a sam file with no header

eQTL_344.100lines.vcf.zip INTP_1452.100lines.sam.zip

jfertaj commented 7 years ago

sorry, I will update the headers now

secastel commented 7 years ago

Hmm, yeah it all looks good from what you uploaded. It's concerning that it is saying there is no alignment score, because the SAM you uploaded clearly has an alignment score field ("AS"). Can you please post the full console output from running phaser?

jfertaj commented 7 years ago

Here are the headers in case you need them and also the full console output header.vcf.zip header.sam.zip

And here is the full console output:

jfertaj@dhcp210:~/Ceres/HRC_files$ python2.7 ~/phaser/phaser/phaser.py \
> --vcf ~/Ceres/HRC_files/ceres.HRC_imputed_MAF0.01_missing0.05_INFO0.3.reheader_addchr.vcf.gz \
> --bam /home/jfertaj/Ceres_rnaseq_151017/Juan/merged/bam/INTP_1031.bam  \
> --paired_end 1 \
> --mapq 255 \
> --baseq 10 \
> --sample eQTL_103 \
> --blacklist hg19_hla.chr.bed \
> --haplo_count_blacklist hg19_haplo_count_blacklist.chr.bed \
> --threads 14 \
> --o phaser_test_case

##################################################
              Welcome to phASER v1.0.0
  Author: Stephane Castel (scastel@nygenome.org)
##################################################

#1. Loading heterozygous variants into intervals...
     removing blacklisted variants and processing VCF...
#1b. Loading haplotypic count blacklist intervals...
     creating variant mapping table...
          1861433 heterozygous sites being used for phasing (0 filtered, 0 indels excluded, 1861433 unphased)
#2. Retrieving reads that overlap heterozygous sites...
     file: /home/jfertaj/Ceres_rnaseq_151017/Juan/merged/bam/INTP_1031.bam
          minimum mapq: 255
          mapping reads to variants...
               completed chromosome chr21...
               completed chromosome chr18...
               completed chromosome chr13...
               completed chromosome chr20...
               completed chromosome chr22...
               completed chromosome chr10...
               completed chromosome chr15...
               completed chromosome chr14...
               completed chromosome chr16...
               completed chromosome chr7...
               completed chromosome chr4...
               completed chromosome chr5...
               completed chromosome chr6...
               completed chromosome chr9...
               completed chromosome chr12...
               completed chromosome chr19...
               completed chromosome chr11...
               completed chromosome chr3...
               completed chromosome chr17...
               completed chromosome chr8...
               completed chromosome chr2...
               completed chromosome chr1...
          processing mapped reads...
          no alignment score value found in reads, cannot use cutoff
          retrieved 0 reads
#3. Identifying connected variants...
     calculating sequencing noise level...
     FATAL ERROR: No reads could be matched to variants. Please double check your settings and input files. Common reasons for this occurring include: 1) MAPQ or BASEQ set too conservatively 2) BAM and VCF have different chromosome names (IE 'chr1' vs '1').
jfertaj commented 7 years ago

I also tried to change sample name in bam file because was not identical to the one in the vcf file, but this also doesn't work

jfertaj commented 7 years ago

Sorry for bothering you @secastel, but have you had the chance to look to the files I sent? If you are busy could you point me out to another software you "trust" to obtain ASE results from my files? Thanks a lot in advance

secastel commented 7 years ago

Hey there, sorry I got distracted with other things. I was able to look at the files and didn't see anything that should cause a problem. A few final questions:

1) What version of samtools are you using?

2) Would it be at all possible for you to provide me with a VCF and BAM from a single chromosome (e.g. 22), so that I can try to reproduce the problem?

jfertaj commented 7 years ago

Thanks @secastel, here are the files and the version of samtools I am using is 1.3

1.bam file: Dropbox link to bam 2.vcf file: Dropbox link to vcf

Please let me know if you can't access them and many thanks

jfertaj commented 7 years ago

Were you able to reproduce the error @secastel?

secastel commented 7 years ago

I was able to reproduce the issue, and have identified the problem.

When you ran phASER you specified a minimum MAPQ of 255, which means uniquely aligned reads when STAR is used as the mapper. Inspecting your BAM the maximum MAPQ for aligned reads is 50.

This means that either you did not use STAR to align the reads, or some sort of downstream processing to adjust MAPQs was done.

You need to determine what the appropriate MAPQ threshold to use is based on your alignment strategy. You should set it to whatever signals uniquely mapped reads.

jfertaj commented 7 years ago

Thanks a lot @secastel, I would look to the code to see if I set up the -SAMoutput flag to 50 for downstream analysis