soedinglab / plass

sensitive and precise assembly of short sequencing reads
https://plass.mmseqs.com
GNU General Public License v3.0
132 stars 14 forks source link

map of input sequences to assembled sequence #25

Open galicae opened 4 years ago

galicae commented 4 years ago

I was wondering if PLASS kept track of which sequences it used for each assembled sequence, and @milot-mirdita told me I would have to search with the assembled sequences against the input sequences to get that information.

Why this is relevant to me: we study a non-model organism using scRNA-seq. We have no high quality genome for it or any closely related species, so we map our reads agains a de-novo transcriptome. Owing to the absurd polymorphism levels present in the genome the usual Trinity pipeline produces close to 1 million "genes", making all downstream analysis very complicated. I thought that going to the amino acid level with a tool like PLASS would improve things.

Using scRNA-seq and de-novo transcriptomes is a great way to study non-model organisms without known/well-annotated genomes (recent examples are the Morpho-Seq paper, or this cell type study in Spongilla). It seems like PLASS could be very useful in this niche. I promise to write the tutorial when this feature is added!