jguhlin / minimap2-rs

Rust bindings to minimap2 library
Other
60 stars 13 forks source link

Issues with cloning of aligner #71

Open rob-p opened 3 weeks ago

rob-p commented 3 weeks ago

Hi @jguhlin,

First; thanks again for the great work on this. I'm already finding minimap2-rs super useful, and we're looking to use it to include a "raw read" mode in our oarfish tool. I noticed two specific issues with cloning an aligner and wondered if you had any thoughts or insight into them.

So I'm following the basic strategy employed in fake minimap2 / minimap2-rs-test, where I have several threads that take reads (in my case over a channel) and then send the reads to another consumer. Each thread needs it's own aligner object and I am calling clone() to get it.

I'm having two problems. First, for context, since I need references to local variables, I am using a std::thread::scope and scoped threads which may be why I see this clearly in my example but the other examples I'm building off of don't.

1) Since each clone gets it's own pointer to the index (the idx field), they all try to drop the index (which calls free in the FFI) when the thread is joined and the scope ends. However, this obviously leads to a double-free and, in my case, a segfault. I can get around this by explicitly setting the idx field of each cloned Aligner to None before the thread returns, but this feels error prone. I wonder if it would make sense to have either (a) some properly resource-managed pointer type for the index (e.g. an Arc or some such), or (b) some specific idx "view" type that essentially constitutes a weakptr to the original index. Either way, it'd be great to have some sort of way to do the clone in a carefree manner but not worry about double-free.

2) It's not clear to me the "depth" of the clone. Specifically, while in the original aligner I call the with_cigar method on the builder, the clones aren't seeming to provide this information by default and instead I'm having to call aligner.clone().with_cigar() to get a clone that also provides CIGAR strings. I guess the expected cloning behavior is for the clone to have all of the same properties set as the original, and the real fear I have here is that I don't know what other fields might be affected apart from the CIGAR string generation (perhaps it's simply the flags that aren't deep cloned)?

Anyway, I'm happy to provide further detail about the above, or to discuss potential solutions or design decisions about addressing these. Let me know your thoughts, and thanks again for minimap2-rs!

--Rob

jguhlin commented 3 weeks ago

It's my first FFI project, so has been a learning experience. I like Arc as a solution.

Surprised about the second point, I'll dig into it this week! Clone should produce full clones.

Sam-Sims commented 3 weeks ago

Hi @rob-p hope you don't mind me jumping in,

I am having the same issue you outline in your first point (I am using crossbeam channels in a similar way to the "fake minimap2") - but after cloning the Aligner and joining the threads I get a double free segfault.

I tried doing something like aligner.idx = None; in the thread after calling aligner.map but I still get a segfault - do you have an example of how you implemented the workaround/multithreading?

Cheers

rob-p commented 3 weeks ago

Hi @Sam-Sims,

You can find the relevant line (and closure) in my code here https://github.com/COMBINE-lab/oarfish/blob/5b09b4345b609559a1aa7d76df75bf3aebac6ba9/src/bulk.rs#L334. Basically, you only let the original aligner retain the pointer and make sure you set it to None before the point of dropping within each thread. Let me know if this helps.

--Rob

Sam-Sims commented 3 weeks ago

Ah thanks that's super useful! I needed to set it to None in the thread closure. Agree it would be nice to be able to clone anAligner without having to worry about a segfault.