milkschen / leviosam2

Fast and accurate coordinate conversion between assemblies
MIT License
108 stars 17 forks source link

leviosam2 for nanopore? #4

Closed bluenote-1577 closed 2 years ago

bluenote-1577 commented 2 years ago

Hi there,

Thanks for the cool software/preprint.

I'm wondering if there are good parameters for remapping nanopore reads, say error rates between 90-95% and lengths between 10 - 100kb? I see that there are workflows for HiFi reads, but I'm curious if parameters for nanopore reads have been tested. I tried the HiFi pipeline for nanopore reads and the results were not great. The default parameters for HiFI were too stringent, so I tried setting -H to 1000 and -G to 5000 but then there were issues with the remappings, especially around indels.

I see no mention to ONT reads in the preprint, so maybe this isn't a priority, but I would be highly interested in such a solution.

Thanks!

milkschen commented 2 years ago

We haven't tested on ONT reads, since ultra-long ONT reads require moving the CIGAR information to the CG:z tag. I believe for typical-length ONT reads the workflow should work though. Around INDELs, we have the re-alignment module (aln.cpp) for localized re-alignment. A re-aligned read will have a LR:i tag, which shows its re-aligned score. Or maybe tweaking re-alignment parameters will help (see those presets in the configs directory).

Further, if your data is sharable, I'm happy to take a look.

bluenote-1577 commented 2 years ago

Thanks for the quick response.

The ONT reads I'm using has the cigar information in the usual flag. I'm having trouble using the realignment workflow. The realignment workflow always puts all reads into the -unliftable.bam file. I tried using -g 500000 in the leviosam2.sh script but it still doesn't work. Funnily enough, if I just use the leviosam2 lift command (which I assume doesn't do realignment) with -G set to the same large number then it works and not all reads are unliftable but there are errors around indels. I think my original comment was wrong; the -H 1000 -G 5000 options didn't actually work for the realignment script.

I'm just using nanopore reads from the HPRC data, see https://github.com/human-pangenomics/HPP_Year1_Data_Freeze_v1.0. I'm just mapping onto some chromosome for some individual (say, HGxxx) and lifting over to CHM13.

The data should be fairly generic nanopore reads. Let me know if you can get it to work.

milkschen commented 2 years ago

We now have an ONT mode (see #9 and #15). I will do more tests in the near future for ONT data. Just FYI.

milkschen commented 2 years ago

Hi @bluenote-1577, please check out our latest release (v0.2.2). That should work for typical ONT reads,