Keeping of an extra buffer

ThibautSchmitt commented 6 years ago

When sending a DICOM, I noticed a large amount of resources used (RAM). I guessed a buffer of the DICOM is kept when receiving and that there is one copy for the sending. However, the memory used is above the 2 copies of the buffer.

I investigated and it seems that an extra buffer is kept during the entire methods Association.send_message and Association.receive_message. Then everything is released, no problems with that (although there might be a small leak, see the prints below).

I printed the memory usage by using the functions here https://stackoverflow.com/questions/63166/how-to-determine-cpu-and-memory-consumption-from-inside-a-process.

I used a DICOM of 85.7Mb and tried to send it. Here's some prints: (see Association.cpp from line 528)

Physical Memory currently used:
Beginning of send_message                                         = 114756Kb (association + loading of the dicom)
After 'command_writer.write_data_set(message.get_command_set());' = 114756Kb
After 'data_writer.write_data_set(message.get_data_set());'       = 200588Kb
After 'auto const data_buffer = data_stream.str();'               = 284192Kb
After send_message                                                = 115012Kb

By releasing the streams, it ends up like:

Physical Memory currently used:
Beginning of send_message                                         = 114760Kb (association + loading of the dicom)
After 'command_writer.write_data_set(message.get_command_set());' = 114760Kb
After 'data_writer.write_data_set(message.get_data_set());'       = 199568Kb
After 'auto const data_buffer = data_stream.str();'               = 283172Kb
After 'std::ostringstream().swap(data_stream);'                   = 198732Kb
After send_message                                                = 115016Kb

I can do a PR for adding the line std::ostringstream().swap(stream) where it is needed. Or please tell me if I am wrong :)

lamyj commented 6 years ago

Good diagnosis. We get three "copies" of the data set:

The data set in the Message object
data_writer.write_data_set will allocate memory for the serialized version of the data set
auto const data_buffer = data_stream.str() creates a copy of this serialized version, since Odil calls the return by value version of str

The same is true for the command set, but the small size of this one has no such visible effect.

We might also have extra copies when calling set_pdv_items since no move semantics are involved.

lamyj commented 6 years ago

The std::stringstream API does not provide access to the underlying sequence of characters, and testing shows that (at least on macOS with clang++ 9.0.0) it allocates too much memory. boost::iostreams provides a solution with back_insert_device.

A quick test yields, for a 500 MB string (numbers represent changes in memory since last message):

Starting (sstream)
Data set serialized: 1006735360
Serialized representation extracted: 500015104
Done: -1305309184
Starting (boost)
Data set serialized: 500011008
Serialized representation extracted: 12288
Done: -500002816

The version using sstream somehow allocates twice the memory when inserting into the stream and requires extra allocation to get the string out of the stream. The Boost version only requires memory to store the serialized version since the stream is only a thin wrapper around our destination string.

@ThibautSchmitt : using the provided test code (and adapting the memory monitor to whatever system you are running on based on https://stackoverflow.com/questions/63166), what output do you get?

Since back_insert_device has been in Boost for ages, this could be an interesting replacement for stringstream-related operations (data set I/O and sending messages).

ThibautSchmitt commented 6 years ago

I tried your test code on my personal laptop (macOS High Sierra 10.13.3) but the results are really random. However it tends to be like your results, for example:

Starting (sstream)
Data set serialized: 1006665728
Serialized representation extracted: 74604544
Done: -934772736
Starting (boost)
Data set serialized: 521736192
Serialized representation extracted: 12288
Done: -500002816

I will try it at work this coming week (Linux Mint 18.1).

lamyj commented 6 years ago

On Debian Stretch:

Starting (sstream)
Data set serialized: 500170752
Serialized representation extracted: 500121600
Done: -999964672
Starting (boost)
Data set serialized: 499855360
Serialized representation extracted: 0
Done: -499847168

So unless you find something very weird happening on Mint, I'll switch to boost::iostreams in Association and possibly in Reader.

ThibautSchmitt commented 6 years ago

I also have about the same results as you. I guess you can switch to boost::iostreams.

Starting (sstream)
Data set serialized: 496254976
Serialized representation extracted: 498126848
Done: -1001787392
Starting (boost)
Data set serialized: 507813888
Serialized representation extracted: 0
Done: -501993472

lamyj commented 6 years ago

Can you checkout the boost-stringstream branch and see if it goes in the right direction? For now, only Association::send_message has been modified, you should already see less pressure on memory.

ThibautSchmitt commented 6 years ago

I did a short video of the memory consumption of 3 containers using the branch boost-stringstream. You'll see 3 windows:

On the left, a "sender" that loads and sends DICOMs with N-CREATE, N-SET and C-STORE SCUs (of 1.4Kb, 1.4Kb and ~670Mb in my test).
At the upper right corner, a "dispatcher" that receives and sends the DICOM to another destination.
At the down right corner, a "dispatcher" too but that only receives the DICOM.

Everything is not perfect, but just to get a vague idea of the memory consumption in 3 different cases.

lamyj commented 6 years ago

For future reference, the Y axis of the top-right graph is cut: the ticks which read 5 GB are actually 0.5 GB marks :)

The sender is more-or-less in line with what I expected: we need 670 MB to store the data set object, and 670 MB to store its serialized representation that is sent. At that point, the serialized version is deleted and the memory drop to 670 MB + whatever else is stored at that point (around 130 MB).

The sink (bottom-right) show the same expected behavior : 670 MB to store the received, serialized data set and 670 MB to store the object.

The router (top-right) is however problematic: there is the same plateau around 1.5 GB that we see on the other two, reaching ~ 2.6 GB indicates a bad design but still explainable (we have two copies of the data set object and two copies of the serialized version), but the peak at 3.8 GB should not be there. I'll check the reference and move semantics on the C++ side and the call and return policies on the Python side.

lamyj commented 6 years ago

By the way, was the video done with or without the changes in the boost-stringstream branch?

ThibautSchmitt commented 6 years ago

With the changes of the branch.

the memory consumption of 3 containers using the branch boost-stringstream

By the way,

the Y axis of the top-right graph is cut: the ticks which read 5 GB are actually 0.5 GB marks :)

You're right, but I cannot manage the axis in the graphs of Portainer :(

lamyj commented 6 years ago

In addition to the copies performed by both istringstream and ostringstream (both removed from Association, cf. b9f2941, but still used in Reader and Writer), an extra copy of the data set is performed by SCUs and SCPs when using the reference semantics calls (e.g. StoreSCU::store(DataSet const &, ...) instead of StoreSCU::store(DataSet && dataset, ...)).

It would make sense to switch to a shared_ptr implementation for the following reasons:

Message could be modified to store a reference to a DataSet, but since there is no way to check whether the reference is still valid, this is not safe.
The current version does not allow shared ownership of a data set: it is either copied or moved.
Move semantics and Python don't play well together. Since Python already uses shared ownerships of objects, using std::shared_ptr should not wreak havoc on the current wrappers.

Any downsides to this solution (obviously besides the API changes, which could be mitigated through gradual deprecation)?

ThibautSchmitt commented 6 years ago

I don't see any downsides right now. The use of shared_ptr seems good to me !

lamyj commented 6 years ago

The part of Message which handles the command- and data-sets would change to:

class Message
{
public:
    Message(
        std::shared_ptr<DataSet> command_set=std::make_shared<DataSet>(),
        std::shared_ptr<DataSet> data_set={});

    std::shared_ptr<DataSet const> get_command_set() const;

    bool has_data_set() const;
    std::shared_ptr<DataSet const> get_data_set() const;
    std::shared_ptr<DataSet> get_data_set();

    void set_data_set(std::shared_ptr<DataSet> data_set);
};

Similar changes would happen to derived classes (Request, Response, and the concrete service messages). This way we would keep const-correctness and avoid copying the data set. The deprecation of by-reference constructors and setters is easily done, and with a bit of template trickery, we can deprecate the by-reference getters.

lamyj commented 6 years ago

Can you re-run your test with the latest commit on the boost-stringstream branch (c2cdc68)? The instances of DataSet are now handled through shared_ptr, and the unit tests pass on the C++ and Python side.

lamyj commented 5 years ago

The new pybind11 wrappers are finally done (0cd6580)! The current head of the boost-stringstream branch (whose name is now meaningless) passes the tests on macOS, and the Travis build should be there soon. In the meantime, I've created backports of pybind11 2.2 for Ubuntu (bionic, xenial and trusty) and for Debian (stretch and jessie): see https://github.com/lamyj/packages for install instructions.

lamyj commented 5 years ago

Closed by 375601d. CI checked on Linux and macOS.

lamyj / odil

Keeping of an extra buffer #66