jguhlin / minimap2-rs

Rust bindings to minimap2 library
Other
60 stars 13 forks source link

question on interleaved reads mapping #63

Open jianshu93 opened 1 month ago

jianshu93 commented 1 month ago

hi @jguhlin,

In this wrapper, is there a way to use interleaved reads (or forward & reverse reads) as input since I only saw single fasta as input.

Thanks, Jianshu

esteinig commented 1 month ago

@jianshu93 I'm not sure what the most parsimonious solution is, but in a synchronous context you could use crossbeam::queue with an Arc<Mutex<Sender>> and then spawn a couple of threads with rayon that read the forward and reverse files using needletail. Each would send the record or sequence into the queue, whose receiver can be concurrently iterated over for mapping in the main function.

@jguhlin would love to hear your thoughts on this - and thanks for creating such excellent bindings, minimap2-rs has been so useful already :)

jguhlin commented 1 month ago

Hey! Apologies for the silence; grant writing time, trying to justify my continued existence...

@jianshu93 @esteinig

For multithreading, I like how I've done it here, using crossbeam queues: https://github.com/jguhlin/minimap2-rs/blob/main/fakeminimap2/src/main.rs

Reading the file is left up to you, via Needletail or other libraries, and I try to keep this library agnostic to file parsing. I'm developing a different file format https://github.com/jguhlin/sfasta/tree/tokio so need it to be agnostic, but I believe some others were loading sequences direct from databases as well.

As for paired reads, minimap2 is really meant for single long-reads, but the python implementation does have some support, as long as both are in the FR orientation. I could possibly port that over if you need it?

@esteinig Thanks for the kind words! Let me know what you are using it for (if public) and I'll add it to this page. Definitely need the ego boost this week, so it's much appreciated!

https://github.com/lh3/minimap2/tree/master/python

(Search for: This method aligns seq against the index...." paragraph.

esteinig commented 1 month ago

@jguhlin

I really like your crossbeam implementation - tried something similar with some channels and rayon but haven't quite managed to replicate it. Learning a bunch of things at least :)

Understand the grant writing insanity, sorry to hear you are in the midst of it - and do wish the reviewers would appreciate efforts like this crate a lot more! If there's anything that helps with exposure let me know. Perhaps a Zenodo link for minimap2-rs to collect citations may help a tiny bit?

I'm currently using it in a host depletion tool scrubby (https://github.com/esteinig/scrubby) on the dev branch for release of 1.0.0. There is another couple of project that use it but not public yet - will ping you when they are!

Re paired reads: I know people use minimap2 for short reads quite a bit, although it's not meant to be really. It doesn't seem to matter whether I plug in R1 and R2 sequentially with the sr preset in benchmarks for human read identification vs minimap2. I'd say it'd be a "nice to have" feature but also would not want to add more work to your (probably extensive) list of things to implement :)

jguhlin commented 1 month ago

@esteinig Thanks. Crossbeam is my go to for multithreading, I came from Clojure so channels/queues are kind of what I was 'raised' on. I'm trying the flume crate, which is supposed to be faster, but have realized that the channels are not my bottleneck. https://github.com/jguhlin/sfasta/blob/3efd730d3ba22cd8ab21fc8306695ce096c82818/compression/src/lib.rs#L456 (and many other places)

Thanks for the well wishes. :) I've thought about a doi, but as Hengi Li (@lh3) has done all the work and I've just added some glue and used it as an excuse to learn FFI, I'm not totally on board yet. But I do list the project and the number of other projects using it on my CV.

I'll get scrubby added to this repo's readme, if that's alright with you? As for the others, no rush, whenever they are ready. It does help me keep motivation up to maintain though!

Regarding the paired reads, if it is being used I'll get it added. Let's consider it on my todo list.

Cheers

wdecoster commented 1 month ago

I don't think it is true that you "just added some glue". This must have been a lot of work, and so is maintaining it. It is also clear that developers from companies have started using your crate. You are having an impact. A DOI, or even a minimal publication, wouldn't be inappropriate. Unfortunately, it is the 'academic currency'.

esteinig commented 1 month ago

Absolutely agree with the effort / maintenance argument, this is a lot of work and people are finding it really useful by the looks of it.

Feel free to add scrubby to the list of course @jguhlin, I'll merge the feature in the next few days. So nice to be able to ship the long read version as a single binary :)

jguhlin commented 1 month ago

@wdecoster Thanks! I really appreciate it. I'll look into getting the DOI setup once I get a little time.

lh3 commented 1 month ago

@jguhlin Thank you so much for your effort. Let me know if you need a letter of support.