Closed dwkarjosukarso closed 4 years ago
Hi Dyah,
Sorry for the bug, please try the most updated v1.0.7.
Thanks for pointing it out! Jon
Hi Jon, thanks for your quick reply!
I have tried v1.0.7 and now have different error:
Traceback (most recent call last):
File "/home/karjosukarso/scSplit-1.0.7/scSplit", line 695, in
Do you think this is also a bug or something wrong from my input files?
Thanks, Dyah
Hi Dyah,
This is not a bug.
For a situation of empty matrixes, please check: 1) is your barcode tag in the BAM files "CB" - if not, you need to specify it using -t/--tag; 2) are you working on a mixed sample VCF rather than a simple merge of individual genotypes? 3) is the correct whitelist barcode file being used?
Cheers,
Jon
Hi Jon, thanks for the help.
I think it is also good to mention that I am working with single cell data generated from Celseq2 (plate-based format). This is what was performed before scSplit
I checked the BAM file I used for scSplit count and it does not really have any barcode tag, I think because of the demultiplexing already performed by Celseq2 pipeline.
NS500173:399:HJ2MCBGX5:1:11106:7631:4324:UMI:GTGGAAAC: 0 chr11 65211863 23 50M * 0 0 ATGTCTTAGTGATTTTAAAGGGGACTCTTCAGGGACTTGTGTACTGGTTA AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE AS:i:-15 XN:i:0 XM:i:3 XO:i:0 XG:i:0 NM:i:3 MD:Z:0T1A1A45 YT:Z:UU NS500173:399:HJ2MCBGX5:1:11106:4610:8169:UMI:TCCTCAGG: 0 chr9 6428994 24 50M * 0 0 GACTCTGCTAAAAATATAAAAGGTTAGACACTATGGTGGTGGATACCTGT AAAAAEEEA/EE/EEEE/EEEEEEEEEEE/EEEEEEAEEEEEEEE/EEEE AS:i:-10 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:0A0T48 YT:Z:UU
I am aware that scSplit is created generally for droplet-based sequencing, but any chance it can be used for a plate based data? Maybe with a different processing before scSplit?
Thanks a lot!
Cheers, Dyah
Hi Dyah,
If you have one SAM per cell, when you merge them into one BAM, you need to add the cell barcode in your script (for example with XC tag, which you need to specify later in scSplit count).
I’ve tested that and it should work.
Cheers,
Jon
Hi Jon, sorry for asking again. I managed to have CB tag in the BAM file, but it is still giving the same error.
My current BAM:
NS500173:399:HJ2MCBGX5:1:11312:3277:11864 0 chr1 629268 12 63M * 0 0 ACTTCTGGCCTCCCTGTTCTTATGAATTCGAACAGCATACCCCCGATTCCGCTACGACCAACT AAAAAEE/EEAEAEEEEEEEEEAEE/EEEEEEEEEAEEAA/A<EEE<EAEAAAAEEEAA//E< CB:Z:GCAAGAAG UB:Z:TTGCAACC MD:Z:7A55 XG:i:0 NM:i:1 XM:i:1 XN:i:0 XO:i:0 AS:i:-3 XS:i:-8 YT:Z:UU NS500173:399:HJ2MCBGX5:4:11508:19261:10847 0 chr1 629268 30 63M * 0 0 ACTTCTGACCTCCCTGTTCTTATGAATTCGAACAGCATACCCCCGATTCCGCTACGACCAACT AAAAAEEEEAEAE<EE<EA6EEAE/E/EEEAEEEEEEAEE/A/EEAAEEEEEEEEEAAEEAEE CB:Z:TCGGTTGA UB:Z:GACGCAAT MD:Z:63 XG:i:0 NM:i:0 XM:i:0 XN:i:0 XO:i:0 AS:i:0 XS:i:-5 YT:Z:UU NS500173:399:HJ2MCBGX5:4:11607:24910:14366 0 chr1 629268 30 63M * 0 0 ACTTCTGACCTCCCTGTTCTTATGAATTCGAACAGCATACCCCCGATTCCGCTACGACCAACT AAAAAEEAEEEEEEEEEEEEEEEEEEEEEEAEE<EEAEEE<</AE/AEAE/EEEEEEA//<E/ CB:Z:GAGGGTAG UB:Z:GAGAGAGA MD:Z:63 XG:i:0 NM:i:0 XM:i:0 XN:i:0 XO:i:0 AS:i:0 XS:i:-5 YT:Z:UU
The vcf file:
cell barcode whitelist CGTCTAAT AGACTCGT GCACGTCA TCAACGAC ATTTAGCG ATACAGAC TGCGTAGG TGGAGCTC TGAATACC TCTCACAC TACTGGTA
command: /home/karjosukarso/scSplit-1.0.7/scSplit count -v CTRL1_merged.vcf -i INV_JPGIII195-CTRL1_tag_sorted_dedup.bam -b whitelist.txt -c /home/karjosukarso/genome_ref/common_snvs_hg38 -r ref_filetered.csv -a alt_filtered.csv -o results
Error:
Traceback (most recent call last):
File "/home/karjosukarso/scSplit-1.0.7/scSplit", line 695, in
Anything I could change to make it work? Thank you again.
Cheers, Dyah
Could you please send me your vcf file?
I zipped it for sending purposes. CTRL1_merged.zip
Hi Dyah,
I tested on the data you sent me, it ran well. Although I manipulated some data to match reads and vcf, the file format (CB tag and vcf format) should be fine.
So here's some possible points you might want to further check: 1) are the barcode white list correctly covering the BAM file? Can you find some reads for any barcode in your list? 2) it's very unusual but do you have most GL(RA)<-0.004, which is GP(RA)<0.99 in your vcf rows? It should be the second item in the GL field, because if such it means your SNVs called from the mixed samples are not heterozygous, and thus we cannot use any to distinguish between different samples;
Please let me know. Thanks, Jon
Hi Jon, thanks for testing it.
Thanks again, Dyah
ok, so basically "scSplit count" does below steps:
We have checked 3, 6 while 1, 4, 7, 8 was tested by nature. Although I still don't know why but you can further check on 2 and 5:
for 2: whether the common SNVs have intersection with the SNVs in your vcf for 5: how many of your read having read flag < 256?
If you still cannot find out the reason after that, I would suggest you send me all the input files for check, and if the BAM is too big for transfer, you can send me a proportion of it. And for data security, you can send me the link to download at jun.xu@uq.edu.au directly.
Thanks, Jon
Hi Dyah,
as we discussed in mail, the reason of your getting empty matrices is due to the inconsistent format of common SNV list and that in your data.
I have added a reminder into the software instructions - thanks for your help to locating the issue.
This issue is now closed. Jon
Hi, I am trying out the tool to use in my research. I am using the following command: python /home/karjosukarso/scSplit/scSplit scSplit count -v CTRL1_merged_filtered.recode.vcf -i CTRL1_merged_deduplicated.bam -b whitelist.txt -c /home/karjosukarso/genome_ref/common_snvs_hg38 -r ref_filetered.csv -a alt_filtered.csv -o results
The following error appear: File "/home/karjosukarso/scSplit/scSplit", line 459 base_calls_mtx[1].to_csv(os.path.join(args.out, args.alt) ^ SyntaxError: invalid syntax
I checked the version of python in the conda env I created for scSplit (python --version) and it is python 3.6.
Do you have any idea what I could do to resolve this error? Thank you, Dyah