paoloshasta / shasta

De novo assembly from Oxford Nanopore reads.
https://paoloshasta.github.io/shasta/
Other
66 stars 9 forks source link

Mixed R9/R10 assemblies #7

Closed paoloshasta closed 1 year ago

paoloshasta commented 1 year ago

@Aline-git posted the following in #1

Hello,

Thanks for this very useful tool !

We would like to use Shasta to assemble ONT sequences of HEK293 cell line. We will have ~60Gbp of sequences from 10.4 chemistry (N50 ~20) and ~2Gbp of UL sequences from 9.4 sequences. I was thinking to use the configuration file 'Nanopore-Phased-R10-Slow-Nov2022.conf'. I have two questions:

Should we tune the configuration file to take into account the UL 9.04 sequences ? If yes, some advises would be precious. Since the HEK cells are inbetween diploid and triploid cells, will there be a problem with the chromosomes that are present in three copies (other than being chimeric) ? Thanks for your help !

Aline

paoloshasta commented 1 year ago

Regarding mixing R10 reads with a small amount of coverage in R9 UL reads: Shasta is designed and written under the assumption that only one type of read is present, and does not do well in hybrid situations with mixed read types. For your situation, a possible way to proceed is to do a phased assembly using only the R10 reads. That would give pretty small phased bubbles, and you could then try and use some other tool (for example https://github.com/rlorigro/GFAse) to do additional phasing using the UL reads. But I know this suggestion is untested and a bit vague.

For the Shasta phased assembly you would use --config Nanopore-Phased-R10-Slow-Nov2022.conf if your reads were generated at 260 bases per second ("slow") or --config Nanopore-Phased-R10-Fast-Nov2022.conf if your reads were generated at 400 bases per second ("fast").

You could still try a Shasta assembly feeding it all the reads, and using those same assembly configurations. But I doubt that it would do much better than the assembly that uses only the R10 reads. Without UL reads, the portion of sequence assembled diploid wil probably be small, unless this cell line is highly heterozygous.

Regarding triploid sequence: Shasta phased assembly works under a diploid assumption, and so it will not be able to phase those regions. The assembly graph can still contain those regions, but they will not be phased and the assembly graph in those regions could be messy.

paoloshasta commented 1 year ago

I am closing this due to lack of discussion. Feel free to reopen it or create a new issue if additional topics emerge.