Open panapapa14 opened 2 months ago
https://github.com/singleron-RD/scrna/blob/master/assets/protocols.json
Two deprecated barcode protocols: GEXSCOPE-MicroBead: 12bp barcode with no whitelist GEXSCOPE-V1: 8bp * 3 barcode
Current in use: GEXSCOPE-V2: 9bp * 3 barcode.
Using the default protocol parameter auto
, the program can automatically identify which of the three protocols the R1 read belongs to.
Thank you very much! So as far as I understand, someone that needs the very specific whitelist file that was used in the sequencing protocol by Singleron team has to ask you about this information. Sorry for any inconvenience, but our lab team invested in sequencing quite a few samples and we aspire to make the most out of this process!
The barcode whitelists are here: https://github.com/singleron-RD/scrna/tree/master/assets/whitelist Is there anything else that I can help?
You can use the following python script to generate all possible barcode combinations of GEXSCOPE-V2.
# GEXSCOPE-V2: 96*96*192 ~1769k possible barcodes
bc_segments = [open(f"GEXSCOPE-V2/bc{i}.txt", 'r').read().splitlines() for i in (1,2,3)]
bcs = ['_'.join([bc1,bc2,bc3]) for bc1 in bc_segments[0] for bc2 in bc_segments[1] for bc3 in bc_segments[2]]
with open('1769k-GEXSCOPE-V2.txt','wt') as f:
for bc in bcs:
f.write(bc + '\n')
Thanks a lot, your guidance is of great importance. I have used your python script generating a whitelist file that looks like this:
AACGGACCT_TTGGTGACC_TTCGGTCAA AACGGACCT_TTGGTGACC_GTACGGACT AACGGACCT_TTGGTGACC_GGCCATGTT AACGGACCT_TTGGTGACC_TATGACACC AACGGACCT_TTGGTGACC_GTGTCTGAA AACGGACCT_TTGGTGACC_TAAGCTTGG AACGGACCT_TTGGTGACC_AAGATCTGC AACGGACCT_TTGGTGACC_CTTGTGCCA AACGGACCT_TTGGTGACC_ACTGGTTCC
and contains around 1770k barcodes. So far so good, but the thing is that when I try a "bustools correct" command in order to correct the busfile that I got from the kallisto algorithm the whitelist is recognized as a wrong one.
Error: barcode length and on-list length differ, barcodes = 16, on-list = 29
The barcodes in the GEXSCOPE-V2 whitelist are 29bp: 9 * 3 + 2 underscore 16bp is the length of 10X genomics barcodes.
Hello again and thank you. I am aware of what you describe. The problem has to do with the description of Singleron's technology in terms of barcode (CB) and UMI (UB) structure in the context of a command. I have used the parameter -x "CB:0-9,UB:19-28" in a kallisto bus command but was not recognised. I would appreciate any suggestions for how I should describe the in use technology.
position 0-based: bc1 [0, 9) linker1 [9, 25) bc2 [25, 34) linker2 [34, 50) bc3 [50, 59) C [59,60) umi [60, 72)
So the parameter for kb should be
-x 0,0,9,0,25,34,0,50,59:0,60,72:1,0,0
which means taking 9+9+9 base pair barcodes from file 0 (aka R1 file) and an 12-bp UMI from that file, and then using the entire sequence in file 1 (aka R2 file) as your biological reads file.
Also, remove the underscore in whitelist file when using kb.
# GEXSCOPE-V2: 96*96*192 ~1769k possible barcodes
bc_segments = [open(f"GEXSCOPE-V2/bc{i}.txt", 'r').read().splitlines() for i in (1,2,3)]
# remove underscore
bcs = [''.join([bc1,bc2,bc3]) for bc1 in bc_segments[0] for bc2 in bc_segments[1] for bc3 in bc_segments[2]]
with open('1769k-GEXSCOPE-V2.txt','wt') as f:
for bc in bcs:
f.write(bc + '\n')
This -x parameter works fine! Thank you very much for your time and effort and have a nice weekend ahead!
Description of the bug
Hello there! I am trying to execute a STAR solo analysis in order to use its outcome for a scvelo run later on. I am using the scRNA data from Singleron's sequencing but I am dealing with difficulties regarding the barcode file used from the algorithm. The usual length of each barcode used from other sequencing platforms is different than the 24 nucleotides that Singleron use, and as a result error occurs. I tried to manually manipulate the barcode file with no success. Any suggestions?
Command used and terminal output
No response
Relevant files
No response
System information
No response