raphael-group / chisel

CHISEL -- Copy-number Haplotype Inference in Single-cell by Evolutionary Links
BSD 3-Clause "New" or "Revised" License
37 stars 11 forks source link

Failed to read from standard input in Counting phased SNPs in matched normal #19

Closed Rongtingting closed 3 years ago

Rongtingting commented 3 years ago

Hi Chisel developers,

Thank you for developing this tool! After I tried the demo successfully, I'm very excited to try it out on a public dataset. The version of CHISEL is 0.0.4 installed via conda (I have tried this version on my own dataset and it worked well).

As described in the detailed tutorial, there are 4 input files: A single-cell barcoded BAM reference human genome a matched-normal BAM (GENERATED via chisel-pseudonormal) A vcf file with phased germline SNPs Yet, chisel didn't complete it's procedure. While the first error message displayed in log of BAF step,

[2021-Jan-26 11:30:53]Counting phased SNPs in matched normal
  File "/home/rthuang/anaconda3/envs/chisel/lib/python2.7/site-packages/bin/../src/BAFEstimator.py", line 272, in <module>
    main()
  File "/home/rthuang/anaconda3/envs/chisel/lib/python2.7/site-packages/bin/../src/BAFEstimator.py", line 86, in main
    snps = selecting(args, phased)
  File "/home/rthuang/anaconda3/envs/chisel/lib/python2.7/site-packages/bin/../src/BAFEstimator.py", line 142, in selecting
    refalt = {c : uniq(l) for c, l in pool.imap_unordered(counting_germinal, jobs) if len(l) > 0}
  File "/home/rthuang/anaconda3/envs/chisel/lib/python2.7/site-packages/bin/../src/BAFEstimator.py", line 142, in <dictcomp>
    refalt = {c : uniq(l) for c, l in pool.imap_unordered(counting_germinal, jobs) if len(l) > 0}
  File "/home/rthuang/anaconda3/envs/chisel/lib/python2.7/multiprocessing/pool.py", line 673, in next
ValueError: ERROR: Failed to read from standard input: unknown file type

I have checked that the vcf format is right, so it means that the pseudonormal.bam is unkown format? but it is generated by chisel-pseudonormal. And in the dir, there are tmp files (hetSNPs-chr1.tmp~hetSNPs-chr22.tmp)

Here I attach the log files or other related info: chisel_mkn45-fulldepth_allchr_hg19_20210126.log possorted_bam_head10.txt pseudonormal_head10.txt phased_vcf_head10.txt

hetSNPs-chr1_head10.tmp.txt

Could you give me some instructions on how to figure it out? Thanks a lot for your time!!!

Rongting

simozacca commented 3 years ago

Thank you for the interest in CHISEL! I would be glad to help you with this issue.

The error is actually raised by one of the two BCFtools commands that are used at this stage, so could you please perform the following checks:

Also, few further very important comments and notes:

Rongtingting commented 3 years ago

Thank you for your help!

I noticed that the error is actually raised by the inconsistent chromosome notation in bam files (1-22) and vcf file (chr1-chr22), which caused no resultes when running bcftools mpileup.

I changed the vcf chromosome notation to 1-22 and then the BAF information is avaiable. But it seems that combing the RDR and BAF need a lot of compute sources? i am stucked in this step now. So i tried another small dataset with the original bam files (~22G) and the whole pipeline runs smoothly.

but it is strange that when i use a different stratege to generate vcf files for the same dataset, there is a Key Error chisel0.0.4_mkn45-50k_fby_allchr_hg19_20210129.log

I also tried chisel_0.0.5 use the chromosome notation (1-22) in both bam fiels and vcf files, it seems that this notation can not be recognized, right? chisel0.0.5.-x-chisel_mkn45-50k_fby_allchr_hg19_20210129.log

And your comments is helpful for me! Thanks a lot!! I just tried the MKN45 dataset and ignored the nature of the data. However, as long as there is a small amout of normal cells in the dataset, then chisel can be hepful, right? Or the dataset should contain more than a certain percent normal cells then the chisel-psudonormal can be used?

And chisel can applied to just one or several chromosomes, if there is no diploid in the applied chromosome regions, then how does it work?

simozacca commented 3 years ago

CHISEL requires some minimum amount of system resources, which are detailed here and we highly recommend its execution on multiple-cpus machines.

CHISEL works with chromosome names with or without chr notation but it requires (as samtools and bcftools) that both reference genome, BAM files, and phased VCF to have the exact same notation. May your errors due to some discrepancy among these? Your log is indicating that no phased SNP has been provided so either: (1) SNPs has wrong chromosome notation, or (2) there are no SNPs with either 0|1 of 1|0 in their records. If you believe that none of these is the reason, could you please provide a small sample of your data where the error is occurring?

CHISEL only requires a relatively small number of normal diploid cells but the accurate use of such pseudo matched-normal sample for SNP calling requires a moderate sequencing coverage (>7x). In any case, a cell line is generally expected to not include any normal diploid cell, so pseudonormal approach cannot work in your context.

Unfortunately, I do not understand your last question: CHISEL can be applied to any subset of chromosomes, and the genome of normal diploid cells is assumed to be sequenced uniformly.

simozacca commented 3 years ago

We assume that the issue was fixed, please feel free to re-open it in case of related problems.