Closed Rongtingting closed 3 years ago
Thank you for the interest in CHISEL! I would be glad to help you with this issue.
The error is actually raised by one of the two BCFtools commands that are used at this stage, so could you please perform the following checks:
0|1
or 1|0
for every chromosome from 1 to 22?hetSNPs-chrXXX.tmp
are present in the running directory /groups/cgsd/rthuang/Results/mkn-45-fulldepth/chisel
for every chromosome XXX from 1 to 22?bcftools mpileup /groups/cgsd/rthuang/Results/mkn-45-fulldepth/chisel/data/pseudonormal.bam -T /groups/cgsd/rthuang/Results/mkn-45-fulldepth/chisel/hetSNPs-chr1.tmp -f /groups/cgsd/rthuang/Results/mkn-45-fulldepth/chisel/data/genome.fa --skip-indels -a INFO/AD -Ou | grep -v '#' | wc -l
bcftools mpileup /groups/cgsd/rthuang/Results/mkn-45-fulldepth/chisel/data/pseudonormal.bam -T /groups/cgsd/rthuang/Results/mkn-45-fulldepth/chisel/hetSNPs-chr1.tmp -f /groups/cgsd/rthuang/Results/mkn-45-fulldepth/chisel/data/genome.fa --skip-indels -a INFO/AD -Ou | bcftools query -f '%CHROM\t%POS\t%REF\t%ALT{0}\t%AD{0}\t%AD{1}\n' -i 'SUM(AD)<=10000' | grep -v '#' | wc -l
hetSNPs-chr1.tmp
with that of the other chromosomes and check that they also workchisel
from within the folder /groups/cgsd/rthuang/Results/mkn-45-fulldepth/chisel
?Also, few further very important comments and notes:
chisel_pseudonormal
algorithm extracts diploid cells from the barcoded BAM file to form a pseudo matched-normal sample, however a cancer cell line with SCNAs is not supposed to contain normal diploid cells, thus chisel_pseudonormal
cannot be used.N=0
). The development of a version of chisel
working without a matched-normal sample (and thus without the need of extracting diploid cells) is in progress but it is not yet available.Thank you for your help!
/groups/cgsd/rthuang/Results/mkn-45-fulldepth/chisel
I noticed that the error is actually raised by the inconsistent chromosome notation in bam files (1-22) and vcf file (chr1-chr22), which caused no resultes when running bcftools mpileup
.
I changed the vcf chromosome notation to 1-22 and then the BAF information is avaiable. But it seems that combing the RDR and BAF need a lot of compute sources? i am stucked in this step now. So i tried another small dataset with the original bam files (~22G) and the whole pipeline runs smoothly.
but it is strange that when i use a different stratege to generate vcf files for the same dataset, there is a Key Error chisel0.0.4_mkn45-50k_fby_allchr_hg19_20210129.log
I also tried chisel_0.0.5 use the chromosome notation (1-22) in both bam fiels and vcf files, it seems that this notation can not be recognized, right? chisel0.0.5.-x-chisel_mkn45-50k_fby_allchr_hg19_20210129.log
And your comments is helpful for me! Thanks a lot!! I just tried the MKN45 dataset and ignored the nature of the data.
However, as long as there is a small amout of normal cells in the dataset, then chisel can be hepful, right? Or the dataset should contain more than a certain percent normal cells then the chisel-psudonormal
can be used?
And chisel can applied to just one or several chromosomes, if there is no diploid in the applied chromosome regions, then how does it work?
CHISEL requires some minimum amount of system resources, which are detailed here and we highly recommend its execution on multiple-cpus machines.
CHISEL works with chromosome names with or without chr
notation but it requires (as samtools and bcftools) that both reference genome, BAM files, and phased VCF to have the exact same notation. May your errors due to some discrepancy among these? Your log is indicating that no phased SNP has been provided so either: (1) SNPs has wrong chromosome notation, or (2) there are no SNPs with either 0|1
of 1|0
in their records. If you believe that none of these is the reason, could you please provide a small sample of your data where the error is occurring?
CHISEL only requires a relatively small number of normal diploid cells but the accurate use of such pseudo matched-normal sample for SNP calling requires a moderate sequencing coverage (>7x). In any case, a cell line is generally expected to not include any normal diploid cell, so pseudonormal approach cannot work in your context.
Unfortunately, I do not understand your last question: CHISEL can be applied to any subset of chromosomes, and the genome of normal diploid cells is assumed to be sequenced uniformly.
We assume that the issue was fixed, please feel free to re-open it in case of related problems.
Hi Chisel developers,
Thank you for developing this tool! After I tried the demo successfully, I'm very excited to try it out on a public dataset. The version of CHISEL is 0.0.4 installed via conda (I have tried this version on my own dataset and it worked well).
As described in the detailed tutorial, there are 4 input files: A single-cell barcoded BAM reference human genome a matched-normal BAM (GENERATED via chisel-pseudonormal) A vcf file with phased germline SNPs Yet, chisel didn't complete it's procedure. While the first error message displayed in log of BAF step,
I have checked that the vcf format is right, so it means that the pseudonormal.bam is unkown format? but it is generated by chisel-pseudonormal. And in the dir, there are tmp files (hetSNPs-chr1.tmp~hetSNPs-chr22.tmp)
Here I attach the log files or other related info: chisel_mkn45-fulldepth_allchr_hg19_20210126.log possorted_bam_head10.txt pseudonormal_head10.txt phased_vcf_head10.txt
hetSNPs-chr1_head10.tmp.txt
Could you give me some instructions on how to figure it out? Thanks a lot for your time!!!
Rongting