irods / irods_client_globus_connector

The iRODS Globus Connector
2 stars 4 forks source link

Probable memory leak #68

Closed peterverraedt closed 1 year ago

peterverraedt commented 1 year ago

afbeelding

We use the latest available version of the globus connect server and the 4-3-0-stable branch of this repository.

korydraughn commented 1 year ago

Thanks for the report. We'll look into it.

It looks like this is over ~24 hrs. During this period, are the transfers roughly the same size or are they totally different?

Do you see memory go up on every transfer?

peterverraedt commented 1 year ago

We have the impression that for each individual transfer, the memory footprint of the corresponding process increases steadily in time, and after the transfer is handled the memory is freed again. So an individual transfer that completes before the memory is full, will run successfully; but if multiple transfers happen at the same time, there is less memory available per transfer, and it is more likely that one of them is killed.

We have ongoing transfers that constantly are syncing files, so roughly all transfers are the same, and consist of big files.

We'll do some tests to have detailed footprints of specific transfer scenarios.

sevenantsfilip commented 1 year ago

I ran some tests, testing the irods globus connector with 2 data sets. The first one is 1.7TB large, consisting of 7226 files and the second one is 170GB consisting of 58 files. With the second dataset I could more easily trigger the memory error since they are all larger files ranging from 1.7GB to 6GB, so gridftp ran out of memory rapidly. The easiest way to trigger errors is to increase the setting "network use" to "aggressive" on an endpoint, this will quickly fill up all available memory (16GB) on the server: dataset_2

A pmap dump of a process using up all of the memory : 3409651.log

JustinKyleJames commented 1 year ago

I'm running some tests now and looking into it.

JustinKyleJames commented 1 year ago

I found a couple of memory leaks. It doesn't have anything to do with the buffers used for transfer. Those are all freed as expected.

There are a couple of data structures that are created in the iRODS calls that needed to be freed. Once I did that I verified that memory no longer grows after multiple transfers.

We'll get a fix out ASAP.

trel commented 1 year ago

Excellent.

alanking commented 1 year ago

@JustinKyleJames - I added checkboxes and checked 4-3-0-stable. Do we plan on cherry-picking this to 4-2-stable and making another release? If not, let me know and I will remove the checkbox for 4-2-stable. In any case, please cherry-pick to main. Thanks!

alanking commented 1 year ago

@JustinKyleJames - Please close if complete

JustinKyleJames commented 1 year ago

Closing