Closed paoloczi closed 3 years ago
I just noticed that ther eare also vectors in the Node and Edge objects. Because of that, just adding a Graph::clear() might not be sufficient to eliminate the fragmentation issue.
Hi Paolo, I understand your concerns but I have to refactor the whole graph to enable proper reusage of allocated memory and at this moment I am not sure what is the best way to do so. If you have any ideas, please let me know. I'll mark this as an enhancement until I handle this.
Best regards, Robert
I was just thinking you could simply add a Graph::clear() function that calls std::vector
This way a Graph could be reused for multiple computations, as long as the clear function is used in between. But since vector::clear() only changes the vector size and not its capacity, this would result in automatic reuse of the memory directly owned by Graph. The memory owned by Node's would be reallocated from scratch each time, but that consists of shorter chunks that contribute less to fragmentation, and this would certainly be a good step in the right direction. In any event, users would not mandated to use Graph::clear() and could instead continue to use a different object for each computation, if they so choose.
I assumed that the fragmentation caused by nodes and edges will make the effect of the basic Graph::clear() function negligible. I'll add the function then.
You might well be right as I don't know for sure the contribution of nodes and edges to fragmentation, and it is possible that it is dominant. Although memory allocators tend to treat small chunks specially, with the result that they have less chance to create fragmentation.
How much this change will help will also depend on usage patterns (e. g. the lengths of alignments being computed). But eliminating the contribution of the Graph vectors will not hurt, and might help. Thank you for considering this change.
Implemented in latest commit on master (v2.0.2
).
I find that doing a large number of alignments in the same process leads to memory fragmentation and eventual memory allocation failure. This problem could be eliminated easily by simply providing a Graph::clear() function that clears all of the vectors stored in the Graph and allows a Graph object to be reused. Vector::clear() does not free the memory, it just sets the size to zero. That way, the vectors capacities would grow to the capacity needed by the largest alignment, but without causing memory fragmentation, because of the small number of vectors involved.
This is easy to implement and I could just create a fork and do it, but I would prefer if this is done in the main repository. I am using the spoa library in the Shasta assembler.
Without this functionality, a large run needs to be split among small separate processes, and multithreaded parallelism results in intolerable memory fragmentation (each thread using separate Graphs, of course).