Closed markutz556 closed 4 years ago
Thank you for your interest in CHISEL! I would be very happy to help with your issue.
First, could you please provide some details about the data that you are using (e.g. barcoded BAM file, how many cells, sequencing coverage per cell, etc.) and which version of CHISEL you applied?
Second, could you please provide the full log message of CHISEL, i.e. all the log messages that have been printed out by CHISEL during the execution?
Last, could you please provide the first 30 lines of the file bb/combo.tsv
(head -n 30 bb/combo.tsv > log
) that has been produced by CHISEL in the running directory?
Thank you for your cooperation!
Thanks for your reply. We have around 800 cells in the barcoded bam file but they are not sequenced in 10x platform and I did the processing step you mentioned in the other issue. The version of CHISEL is 0.0.4 installed via conda.
Here is the log file for the run: log.txt.
The first 30 lines of the file bb/combo.tsv: log_combo.txt
We had a problem when submitting the Y chromosome mpileup result to the Michigan Imputation Server and thus we left out the Y chromosome. Does this matter?
Thanks for your help!
Thanks for this information!
Unfortunately there seems to be some issues with the cell names/barcodes:
[2020-Sep-16 15:21:50]Number of selected cells: 1
T
(fourth column in combo.tsv
), which is an issue.These observations may be due to some errors in the barcoded BAM file, however I would be very happy to look into this too and to help fixing any potential issue in the generation of such barcoded BAM file.
To start to do this, could you please provide some details on how you generated the barcoded BAM file and could you please extract the first 3 sequencing reads (SAM records) from the barcoded BAM file ${BAM}
with the following command:
samtools view ${BAM} | head -n 3
Thanks for the information.
I used this shell script to append the barcodes to the bams:
cust_func(){
samtools addreplacerg -r 'ID:CB:Z:'"${i}"'\tSM:CELL_BARCODE' ${i}.bam -o ${i}_barcoded.bam -O BAM -@ 8
}
for i in $(cat barcodes.txt); do
echo $i
cust_func ${i} &
done
where barcodes.txt contains a list of barcodes and each of barcodes has a corresponding bam file with the name ${barcode}.bam.
Then I merge all the barcoded.bam fils using samtools merge
The head 3 reads for the barcoded bam file: barcoded_bam_head3.txt
Thanks for your help.
Thanks for the provided information!
The problem is that CHISEL requires each cell barcode to be a string only composed of the letters A,C,G,T
preceded by CB:Z:
(see details in required data). Thus, the cell barcode RG:Z:CB:Z:TFRIPAIR4_FL-A98167-R09-C26
in your example is not a valid barcode, while RG:Z:CB:Z:TATTGGCA
would be.
You can generate an arbitrary list of 800 cell barcodes of 8 letters with the following script:
for BARCODE in {A,C,G,T}{A,C,G,T}{A,C,G,T}{A,C,G,T}{A,C,G,T}{A,C,G,T}{A,C,G,T}{A,C,G,T}; do echo "CB:Z:"${BARCODE}; done | shuf -n 800
and you can use these barcodes for your cells. Using these barcodes instead of the current names in your barcodes.txt
everything should work correctly. Also, if barcodes.txt
contains file names you can append the barcodes as a second column of the file and in the command you can simply use the first field of each line as the file name and the second field as the barcode to add as an RG flag.
Just one additional note, I think that you may need ${1}
instead of ${i}
in your function cust_func()
to access the first argument.
Please let me know if this procedure would work for you and if you have any further question.
Thanks for your help. I changed the barcode names as you suggested and CHISEL can identify the correct number of cells now. But I still got into the an error on the extracting chromosome step: ValueError: invalid literal for int() with base 10: ''
.
I've been trying to find the problem but with no success. I wonder if you can help with this error. I included the error log for this: log.txt
Just for reference, these are the first 5 lines for our phased snp file:
1 54676 1|0
1 91536 1|0
1 546697 1|0
1 705882 1|0
1 757394 1|0
Thank you!
The error seems due to the presence of a non autosome (chr1
, ..., chr22
) in the list of phased SNPs. Could you please try to create a new list of phased SNPs by grepping only the lines that start with an automosome (i.e. chr1
, ..., chr22
) and run CHISEL on this list?
I am sorry for this inconvenience, in the next release we are adding the filtering of auotosomes as an automatic feature.
We assume that the issue has been solved; please feel free to re-open the issue if this is not the case or there is any other related issue.
Hi Team,
Thanks for the CHISEL tools! We've been trying to apply this tool to our data and we got this error in the Selecting ploidies step: AssertionError: There is a bin with a BAF shift > 0.5, likely BAF was not mirrored between 0 and 0.5. And then the rest steps all got into errors. I wonder if you have any ideas what might cause this and how we can fix this. Any ideas will be appreciated. Thanks!
Best,