reinterpretcat / vrp

A Vehicle Routing Problem solver
https://reinterpretcat.github.io/vrp/
Apache License 2.0
331 stars 68 forks source link

Possible memory leak #39

Closed ibatanov closed 2 years ago

ibatanov commented 2 years ago

Hi! I am very glad that there is a similar solution on rust, it is really very cool! I recently started learning rust, and I wanted to create an example based on your solution in the following bundle actix-web -> vrp-pragmatic. for example

    let environment = Arc::new(Environment::default());
    let core_problem = (request.problem.clone(), vec![request.matrix.clone()]).read_pragmatic();
    let core_problem = Arc::new(core_problem.unwrap_or_else(|errors| {
        panic!("cannot read pragmatic problem: {}", FormatError::format_many(errors.as_slice(), "\t\n"))
    }));
    let solver = Builder::new(core_problem.clone(), environment.clone())
        .with_max_time(Some(60))
        .with_max_generations(Some(100))
        .build()
        .unwrap();
    let (solution, _cost, _) = solver.solve().unwrap();

With this approach, I may have (not sure) detected a memory leak, because the memory is not released between load testing using the AB utility. I am using Instruments (mac os) to find the problem. Снимок экрана 2021-08-22 в 16 28 21 As far as I understand, one of the problems is https://github.com/reinterpretcat/vrp/blob/75dbb8f4e55461066cd7771654381f9195bc3cbb/vrp-core/src/solver/population/elitism.rs#L134

I will be grateful for any hint where to look next

reinterpretcat commented 2 years ago

Thanks for feedback!

To be honest, I don't see anything suspicious in sort function. Can it be related that there was a spike in usage, memory chunk was allocated, but not actually used by the app afterwards?

ibatanov commented 2 years ago

I also didn't see anything suspicious in this function, but when using testing as an example of ab -p test.json application/json -c 12 -n 1000 http://myurl the memory will constantly grow and not be released. According to observations, I was able to overclock the memory consumption to 1.5 GB and let the application "cool down", but the memory was not released. I will watch more.

I have already thought about the actix problems, but with simple examples, it does not consume more than 5mb of memory during ab stress testing

ibatanov commented 2 years ago

I created a test repo to reproduce the error https://github.com/ibatanov/vrp_test At the moment, I am confused in search of a leak. The picture I get, according to the test results-the memory will grow until the oom kills the process. A similar picture will be in Java, if you connect it as a native library. I am ready to continue my research, but I am new to rust, tell me, maybe I can provide some more information to find the problem?

reinterpretcat commented 2 years ago

Hm, it seems I need to look into issue, but don't have really time for it at the moment. Otherwise my approach would be:

ibatanov commented 2 years ago

Yes, the unsafe code, I have already removed from the test, so far, as it seems to me specifically, it does not affect in any way. All profiling (I use valgrind) is still stuck in multithreading. Profiling multithreading is still difficult for me. I will keep looking for the problem. Please let us know if there are any changes in this direction.Thanks!

reinterpretcat commented 2 years ago

there is a "hard switch" to disable multithreading in wasm32 builds:

https://github.com/reinterpretcat/vrp/blob/master/vrp-core/src/utils/parallel.rs

you can comment these lines: https://github.com/reinterpretcat/vrp/blob/master/vrp-core/src/utils/parallel.rs#L6-L66 , delete this line: https://github.com/reinterpretcat/vrp/blob/master/vrp-core/src/utils/parallel.rs#L68 and it should be working on single thread then.

This trick is useful for debugging sometimes.

reinterpretcat commented 2 years ago

At the moment, profiler shows that there are some problems and points to internal problem creation:

image

However, I cannot find anything suspicions here.. Maybe some circular reference in core problem model usage (CoordIndex is highlighted by profiler too)

reinterpretcat commented 2 years ago

Localized a bit: it seems memory leak happens in rosomaxa population algorithm: https://github.com/reinterpretcat/vrp/blob/master/vrp-core/src/solver/population/rosomaxa.rs#L458

reinterpretcat commented 2 years ago

I think I found it: it is in gsom network implementation

reinterpretcat commented 2 years ago

Fixed by https://github.com/reinterpretcat/vrp/commit/dd6179cce1e0bdab23bfd95dee4255c3d6728d09

ibatanov commented 2 years ago

Hi! Sorry, I didn't see any activity, I fell out for health reasons. Cool news, I will definitely test it in the near future!