Closed Adoni5 closed 1 year ago
Merging #40 (47bd630) into main (8e4b127) will increase coverage by
0.61%
. The diff coverage is50.00%
.:exclamation: Current head 47bd630 differs from pull request most recent head c0cb0f0. Consider uploading reports for the commit c0cb0f0 to get more accurate results
@@ Coverage Diff @@
## main #40 +/- ##
==========================================
+ Coverage 49.25% 49.86% +0.61%
==========================================
Files 3 3
Lines 735 742 +7
==========================================
+ Hits 362 370 +8
+ Misses 373 372 -1
Impacted Files | Coverage Ξ | |
---|---|---|
src/lib.rs | 75.20% <50.00%> (+0.74%) |
:arrow_up: |
Hey, awesome work! I'm pretty good at Rust, but my C/C++ and FFI skills are lacking, and this is my first project touching both. I've long suspected there will be one (or more) leaks somewhere.
Thanks for mappy-rs as well, I'm using medaka on a very large genome, and it dies after a few weeks with no output. I'm planning to switch out mappy with yours to get multithreading (to speed up the process and hopefully get to the point of crashing a bit quicker).
I'll plan to get this tested and push out in the next day or two. Cheers :)
but my C/C++ and FFI skills are lacking, and this is my first project touching both
Me too! We never noticed before because I was developing on machines with a couple of hundred Gigabytes of spare RAM. The minimap2-rs
and minimap2-sys
crates are awesome work, but with not having Rust take care of memory management, it's more complicated. I would love to just directly port minimap2
to Rust but ain't nobody got time for that π
My first (and worst) attempt was to use a transpiler to convert it to rust, but I couldn't follow it all. My current new dream is to decouple the pthreads library so that it will compile on Windows and WASM (I have a pan-genome browser I'm working on; spare-time project, but it would be cool to have mm2 built-in).
Hey @jguhlin
Thanks again for all your work on this! We use it for https://github.com/Adoni5/mappy-rs/. The reason we wrote mappy-rs in the first place was get multi threaded performance in a long running application - namely readfish. With the difference in output on ONTs PromethION, we couldn't keep up just using mappy. Unfortunately I didn't know enough C/C++ to implement this threading as a extension to mappy, and I wanted the opportunity to practice my rust, rather than just multithread in python (perhaps a mistake π).
The problem
We noticed creeping memory usage when mapping repeatedly using the same
Aligner
instance. These graphs are made using memray. The test script I used was https://github.com/Adoni5/mappy-rs/blob/main/tests/memory_test.py.This was true of not running multithreaded as well - this was with a single thread.
After rooting around in PyO3, I became convinced that the problem lies in
minimap-rs
after running the same multi threaded file, but not actually calling out tominimap2::aligner::map
, but instead just returning a Mapping (as defined bymappy-rs
). This showed flat memory usage - .You also see this leakage when using
minimappers2
in the same way.I thought the problem might lie in the threadbuffer - as discussed in this minimap2 issue - https://github.com/lh3/minimap2/issues/855. This turned out to help, but we still leaked a lot of memory. The change I made to renew threadbuffer was effectively copied from https://github.com/nanoporetech/bonito/blob/b7074d7c2ae2d781db99c40ba911ee9a671206d7/bonito/aligner.py#L19.
However I eventually used valgrind to test and found leaks in the
regs
,regs.p
andmm_gen_cs
. An example (after fixing the first two)By adding calls to free here, as they do in https://github.com/lh3/minimap2/blob/ace990c381c647d6cf8fae7a4941a7b56fb67ae7/python/mappy.pyx#L212-L216, I now no longer see memory leaks!
I'll include a bit more detail after tidying up
mappy-rs
( I no longer pass my own CI π ), and will make any changes to this if it fails yours.Fake minimap2
I changed this script a fair bit to test this without any messing around in
PyO3
. I think it's all fairly well commented, but if it's broken anything I'm happy to drop these changes, they were purely for testing purposes.NB I have only tested these changes on
Linux 5.19.0-45-generic #46~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Jun 7 15:06:04 UTC 20 x86_64 x86_64 x86_64 GNU/Linux
. I have no Idea how/if this will run onaarch64
, or how this will play withsimd
etc.