STAR solo analysis of scRNA

panapapa14 commented 2 months ago

Description of the bug

Hello there! I am trying to execute a STAR solo analysis in order to use its outcome for a scvelo run later on. I am using the scRNA data from Singleron's sequencing but I am dealing with difficulties regarding the barcode file used from the algorithm. The usual length of each barcode used from other sequencing platforms is different than the 24 nucleotides that Singleron use, and as a result error occurs. I tried to manually manipulate the barcode file with no success. Any suggestions?

Command used and terminal output

No response

Relevant files

No response

System information

No response

zhouyiqi91 commented 2 months ago

https://github.com/singleron-RD/scrna/blob/master/assets/protocols.json

Two deprecated barcode protocols: GEXSCOPE-MicroBead: 12bp barcode with no whitelist GEXSCOPE-V1: 8bp * 3 barcode

Current in use: GEXSCOPE-V2: 9bp * 3 barcode.

Using the default protocol parameter auto, the program can automatically identify which of the three protocols the R1 read belongs to.

panapapa14 commented 2 months ago

Thank you very much! So as far as I understand, someone that needs the very specific whitelist file that was used in the sequencing protocol by Singleron team has to ask you about this information. Sorry for any inconvenience, but our lab team invested in sequencing quite a few samples and we aspire to make the most out of this process!

zhouyiqi91 commented 2 months ago

The barcode whitelists are here: https://github.com/singleron-RD/scrna/tree/master/assets/whitelist Is there anything else that I can help?

zhouyiqi91 commented 2 months ago

You can use the following python script to generate all possible barcode combinations of GEXSCOPE-V2.

# GEXSCOPE-V2: 96*96*192 ~1769k possible barcodes
bc_segments = [open(f"GEXSCOPE-V2/bc{i}.txt", 'r').read().splitlines() for i in (1,2,3)]
bcs = ['_'.join([bc1,bc2,bc3]) for bc1 in bc_segments[0] for bc2 in bc_segments[1] for bc3 in bc_segments[2]]

with open('1769k-GEXSCOPE-V2.txt','wt') as f:
    for bc in bcs:
        f.write(bc + '\n')

panapapa14 commented 2 months ago

Thanks a lot, your guidance is of great importance. I have used your python script generating a whitelist file that looks like this:

AACGGACCT_TTGGTGACC_TTCGGTCAA AACGGACCT_TTGGTGACC_GTACGGACT AACGGACCT_TTGGTGACC_GGCCATGTT AACGGACCT_TTGGTGACC_TATGACACC AACGGACCT_TTGGTGACC_GTGTCTGAA AACGGACCT_TTGGTGACC_TAAGCTTGG AACGGACCT_TTGGTGACC_AAGATCTGC AACGGACCT_TTGGTGACC_CTTGTGCCA AACGGACCT_TTGGTGACC_ACTGGTTCC

and contains around 1770k barcodes. So far so good, but the thing is that when I try a "bustools correct" command in order to correct the busfile that I got from the kallisto algorithm the whitelist is recognized as a wrong one.

Error: barcode length and on-list length differ, barcodes = 16, on-list = 29

zhouyiqi91 commented 2 months ago

The barcodes in the GEXSCOPE-V2 whitelist are 29bp: 9 * 3 + 2 underscore 16bp is the length of 10X genomics barcodes.

panapapa14 commented 2 months ago

Hello again and thank you. I am aware of what you describe. The problem has to do with the description of Singleron's technology in terms of barcode (CB) and UMI (UB) structure in the context of a command. I have used the parameter -x "CB:0-9,UB:19-28" in a kallisto bus command but was not recognised. I would appreciate any suggestions for how I should describe the in use technology.

zhouyiqi91 commented 2 months ago

GEXSCOPE-V2 pattern

position 0-based: bc1 [0, 9) linker1 [9, 25) bc2 [25, 34) linker2 [34, 50) bc3 [50, 59) C [59,60) umi [60, 72)

So the parameter for kb should be -x 0,0,9,0,25,34,0,50,59:0,60,72:1,0,0

which means taking 9+9+9 base pair barcodes from file 0 (aka R1 file) and an 12-bp UMI from that file, and then using the entire sequence in file 1 (aka R2 file) as your biological reads file.

Also, remove the underscore in whitelist file when using kb.

# GEXSCOPE-V2: 96*96*192 ~1769k possible barcodes
bc_segments = [open(f"GEXSCOPE-V2/bc{i}.txt", 'r').read().splitlines() for i in (1,2,3)]
# remove underscore
bcs = [''.join([bc1,bc2,bc3]) for bc1 in bc_segments[0] for bc2 in bc_segments[1] for bc3 in bc_segments[2]]

with open('1769k-GEXSCOPE-V2.txt','wt') as f:
    for bc in bcs:
        f.write(bc + '\n')

panapapa14 commented 2 months ago

This -x parameter works fine! Thank you very much for your time and effort and have a nice weekend ahead!

singleron-RD / scrna