irods / irods_rule_engine_plugin_python

BSD 3-Clause "New" or "Revised" License
10 stars 14 forks source link

Execution of Genquery leads to increased memory allocation #216

Open tsmeele opened 1 month ago

tsmeele commented 1 month ago

Environment: iRODS 4.2.12 Genquery appears to significantly leak memory when the query result includes many rows.

Code to reproduce the issue:

#!/usr/bin/irule -r irods_rule_engine_plugin-python-instance -F

import genquery
import os
import psutil

def report_usage(process, callback):
    stat = "Memory usage: " + str(process.memory_info().rss / 1024**2)
    callback.writeLine("stdout", stat)

def coll_query(callback):
    coll_iter = genquery.row_iterator(
    "COLL_ID",
    "",
    genquery.AS_LIST,
    callback)

    collcount = 0
    for row in coll_iter:
        collcount = collcount + 1

    return collcount

def main(rule_args, callback, rei):
    process = psutil.Process(os.getpid())
    report_usage(process, callback)
    coll_query(callback)
    report_usage(process, callback)
    coll_query(callback)
    report_usage(process, callback)
    count = coll_query(callback)
    report_usage(process, callback)
    callback.writeLine("stdout","count of collections = {}".format(count))

INPUT *inp="null"
OUTPUT ruleExecOut

Example output:

$ irule -r irods_rule_engine_plugin-python-instance -F python_rule.r
Memory usage: 79
Memory usage: 126
Memory usage: 166
Memory usage: 200
count of collections = 348737
$
alanking commented 1 month ago

Marking as bug due to the possibility of the GenQuery iterator leaking memory. We will investigate this.

tsmeele commented 1 month ago

Further analysis (tested with iRODS 4.3.2 server) reveals that the issue stems from a deeper cause, unrelated to the iterator. An interaction from Python with certain msi suffices, as demonstrated by below code. In contrast, calls to msiSplitPath apparently do not leak.

While the impact at first sight may seem limited, our experience is that to process some 60,000 data objects using a combination of a few queries and msi's in sequence for each data object can make an iRODS agent process already allocate over 32 GB memory.

 irule -r irods_rule_engine_plugin-python-instance -F py.r
64172032  bytes memory allocated
run msiMakeQuery (50 runs)
65753088  bytes memory allocated
run msiExecGenQuery (50 runs)
67076096  bytes memory allocated
run msiSplitpath (500000 runs)
67076096  bytes memory allocated

Source code:

ton@irods:~$ cat py.r
#!/usr/bin/irule -r irods_rule_engine_plugin-python-instance -F

import os
import psutil
import irods_types

def mem(callback):
    process = psutil.Process(os.getpid())
    callback.writeLine("stdout","{}  bytes memory allocated".format(str(process.memory_info().rss)))

def repeat(callback, runs, function, parms):
    for i in range(runs):
        if parms == None:
            function(callback)
        else:
            function(callback, parms)

def test(callback, runs, title, function, parms=None):
    callback.writeLine("stdout", "run {} ({} runs)".format(title, str(runs)))
    repeat(callback, runs, function, parms)
    mem(callback)

def makeQuery(callback):
    ret = callback.msiMakeGenQuery("COUNT(USER_ID)","",irods_types.GenQueryInp())
    return ret['arguments'][2]

def execQuery(callback, query):
    # NB: Must use a query that is guaranteed to return less than 256 rows
    #     so that the query will close automatically
    ret = callback.msiExecGenQuery(query, irods_types.GenQueryOut())

def splitPath(callback):
    ret = callback.msiSplitPath("collection/dataobject",0,0)

def main(rule_args, callback, rei):
    mem(callback)
    test(callback, 50, "msiMakeQuery", makeQuery)
    query = makeQuery(callback)
    test(callback, 50, "msiExecGenQuery", execQuery, query)
    # show that splitPath does not leak memory
    test(callback, 500000, "msiSplitpath", splitPath)

INPUT *inp="null"
OUTPUT ruleExecOut
tsmeele commented 1 month ago

Turns out the issue also occurs, to a lesser degree, when we use the iRODS rule language engine. Using this rule engine we can demonstrate impact at 500 runs of msiMakeGenQuery. There is no visible impact when we limit the number of runs to 50.

ton@irods:~$ irule -r irods_rule_engine_plugin-irods_rule_language-instance -F rule.r
Memory usage: 64176128 bytes allocated (at start)
Memory usage: 64176128 bytes allocated (after 500,000x memusage)
Memory usage: 65998848 bytes allocated (after 500x makequery)

Source code:

ton@irods:~$ cat rule.r
#!/usr/bin/irule -r irods_rule_engine_plugin-irods_rule_language-instance -F

myRule() {
   *m = ""; 
   # memusage is a Python rule that returns process mem size 
   memusage(*m);
   writeLine("stdout", "*m bytes allocated (at start)");

   # show that call to memusage does not leak 
   for (*i = 0; *i < 500000; *i = *i + 1) {
      memusage(*m);
   }
   writeLine("stdout","*m bytes allocated (after 500,000x memusage)");

   # test if call to msiMakeGenQuery leaks
   for (*i = 0; *i < 500; *i = *i + 1) {
       msiMakeGenQuery("COUNT(USER_ID)", "", *genQueryInp);    
   }
   memusage(*m);
   writeLine("stdout","*m bytes allocated (after 500x makequery)");
}

INPUT *inp="aa"
OUTPUT ruleExecOut

Python rule that measures memory allocated by agent process:

import os
import psutil

def memusage(rule_args, callback, rei):
    process = psutil.Process(os.getpid())
    stat = "Memory usage: " + str(process.memory_info().rss)
    rule_args[0] = stat
trel commented 1 month ago

Very nice.

Okay, so we're eyeballing msiMakeGenQuery and msiExecGenQuery directly now.

Upon investigation, we may move this issue to irods/irods.

korydraughn commented 4 weeks ago

msiMakeGenQuery and msiMakeQuery will always leak unless the handles to the query objects are deallocated.

One thing I noticed while running the code at https://github.com/irods/irods_rule_engine_plugin_python/issues/216#issuecomment-2285982760 - If I set the upper limit for the msiMakeGenQuery loop to 1 and then gradually increase it to 465, the memory usage remains the same. However, once the loop counter exceeds 465, the memory usage increases.

Very weird. It's not clear to me why that happens.

@tsmeele Perhaps you can reproduce what I'm seeing on your computer.

I also tried replacing the GenQuery1 MSIs with GenQuery2 MSIs. Memory usage increases with those MSIs too, yet they do not contain memory leaks AFAIK.

*h = '';
msi_genquery2_execute(*h, 'select count(USER_ID)');
msi_genquery2_free(*h);

Asan and valgrind should help reveal what's happening.

Aside from that, looks like we don't provide dedicated MSIs for closing GenQueryInput/Output objects separately.

tsmeele commented 4 weeks ago

@korydraughn Indeed we get similar results. The choppy memory allocation is most likely explained by buffering done internally by the malloc() algorithm. AFAIK msiMakeGenQuery does not actually create a query handle or perform any rsXXX calls, it merely populates and returns a data structure?

tsmeele commented 4 weeks ago

Some further experimentation, now using a MyRods client. This client is able to issue rcXXX API requests directly, bypassing the rule engine msi code.

Reassuring result is that, even after 12000 rsGenQuery requests, the agent memory allocation has not increased.

NB: In below output, rcExecMyRule runs have been added as a contrasting alternative case, known to impact agent memory a little.

MyRods started
Native auth successful
Memory usage: 65093632
Executing rcGenQuery "SELECT COUNT(DATA_ID)" 12000 runs
104 seconds have elapsed
Memory usage: 65093632
Executing rcExecMyRule 12000 runs
128 seconds have elapsed
Memory usage: 67055616

The rsGenQuery result using rcXXX API calls contrasts with results obtained with requests issued via microservices within a rule engine environment. The rule engine based results had shown memory to increase aready after 50 runs (see earlier in this issue).

The new experiment narrows our search to the implementation of the Genquery related microservices in the native rule language engine. And possibly on top of that also parameter passing back and forth in related callback requests issued by the Python rule engine.

korydraughn commented 4 weeks ago

The choppy memory allocation is most likely explained by buffering done internally by the malloc() algorithm.

That's very possible.

AFAIK msiMakeGenQuery does not actually create a query handle or perform any rsXXX calls, it merely populates and returns a data structure?

Correct. It does not execute the query. It makes at least three memory allocations, one being for the genQueryInp_t.