Closed bettycatherine closed 4 years ago
Hi Betty,
I don't know the exact details on how they used starcode in the split-seq paper. However, per your description the answer is yes. Starcode-umi expects to have a single sequence which contains the UMI first, then the sequence to be clustered.
So preprocess the sequences to have single sequences of UMI followed by cDNA, then run starcode-umi and use the option --umi-len 10
to tell starcode that your UMI are 10bp long. You can define the match distance of both the UMI part and the sequence part passing the options --umi-d
and --seq-d
respectively. If you don't set them, the defaults are distance 0 for UMI (exact matches) and an automatic distance for the sequence depending on its length.
Eduard
Thank you very much for your reply, and it is very clear to me!
Betty
Hello, I am working with split-seq data, and from the original split-seq paper, they used starcode to collapse umi. The raw data of split-seq were paired-end reads, with read 1 comprised cDNA and read 2 mainly comprised umi and barcodes and other linkers, the UMI was the first 10 bp on read 2. SO the question is how should I give these data and information to starcode-umi, cause if I understand correctly, the sequence distance of starcode-umi means the distance between cDNA? Starcode-umi clusters cDNA first and cluster UMI from similar cDNA second to collapse UMI, so should I extract those 10bp UMI from read 2 and attach them to read 1? This is really bother me and any answer will be highly appreciated. Thank you!
Betty