Open penglbio opened 6 years ago
The parameter -d
specifies the clustering distance (the number of mismatched nucleotides you want to allow). So, in your example with distance 1, you'd run starcode as follows:
starcode -d1 input-file.fastq
Hope it helps.
I will try. Thank you very much.
how about the fasta, I test like the following, but can't work. $ starcode -d 1 test_file.fasta running starcode with 1 thread reading input files FASTA format detected sorting progress: 100.00% message passing clustering AGGGCTTACAAGTATAGGCC 2 CCTCATTATTTGTCGCAATG 1 TGCGCCAAGTACGATTTCCG 1 TGGGCTTACAAGTATAGGCC 1
the last sequence just 1 mismatch with the first.
Note that you are using message passing algorithm for clustering. Message passing has a parameter called --cluster-ratio
which is set to 5
by default. This parameter sets a restriction on the ratio of sequences needed to cluster one sequence with another. So, in other words, by default two sequences will only be clustered together if the count of one is at least 5 time bigger than the count of the other.
In your example, you are running starcode with just a few sequences and default parameters. Note that the last and the first sequence did not cluster together because their cluster ratio is 2, i.e. the first has 2 counts and the last has only 1.
So, to solve this, do one of the following:
--cluster-ratio
.-s
).Hope it helps.
I am really confused. Starcode was used in that paper for UMI collapse, so I think they used starcode-umi but not starcode. Am I correct? I am also wondering if there is any advice on how to set sequence distance when we use starcode-umi. Thank you very much!
I am really confused. Starcode was used in that paper for UMI collapse, so I think they used starcode-umi but not starcode. Am I correct? I am also wondering if there is any advice on how to set sequence distance when we use starcode-umi. Thank you very much!
But the UMI(10bp) is in the R2.fq file, it said the cDNA reads (Read 1) were mapped to genome, and then used Starcode (45) to collapse UMIs of aligned reads that were within 1 nt mismatch of another UMI, assuming the two aligned reads were also from the same UBC. I don't konw if I should combine the UMI and read 1, but it can not mappepd to genome,I don know what is the correct method.
Hi @wangjianing-web. I can't tell which is the correct method they used. You should contact the authors for more details on how they used starcode in their work. What I understand from your description is that they followed these steps:
sorry to trouble you. In a paper, I saw someone use your software(starcode)to cluster sequences within 1nt mismatch. the following is the paper title and description: title:Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding description:We then used Starcode(45)to collapse UMIs of aligned reads that were within 1nt mismatch of another UMI
I am confused, because In your software, I didn't find a parameter to set. can you tell me did there is a method to solve this problem