sidorares / node-mysql2

:zap: fast mysqljs/mysql compatible mysql driver for node.js
https://sidorares.github.io/node-mysql2/
MIT License
4.08k stars 620 forks source link

Possible memory leak with connection.execute or I lack understanding of connection.execute #887

Open jspatel opened 5 years ago

jspatel commented 5 years ago

Hi

I wrote a nodejs script to generate 30M records and insert periodically using the mysql2 driver with promise.

Here is the cut down code. This runs out of heap and crashes when I use the connection.execute. However it works fine when I use the connection.query.

'use strict'
const mysql2 = require("mysql2/promise");
const util = require("util");
async function generateLargeRecordset(limit){
    const insertPrefix = "INSERT INTO SOME_LARGE_TABLE (col1,col2,col3, col4) VALUES ";
    const connection = await mysql2.createConnection({
        host: 'localhost',
        user: 'root',
        password: 'password',
        database: 'database'
    });
    const insertArray = [];
    for(let i=0;i<limit;i++){
        insertArray.push(util.format(`(col1${i},col2${i},col3${i},col4${i})`));
        if(insertArray.length >= 1000){
            let query = insertPrefix + insertArray.join(',') + ';';
        // let result = await connection.execute(query); // <== heap memory issue. crashes when it runs out of heap
        // let result = await connection.query(query); //<== No heap memory issue
            insertArray.length = 0;
        }
    }

    if(insertArray.length >= 0){
        let query = insertPrefix + insertArray.join(',') + ';';
        // let result = await connection.execute(query); // <== heap memory issue. crashes when it runs out of heap
        // let result = await connection.query(query); //<== No heap memory issue
        insertArray.length = 0;
        const used = process.memoryUsage().heapUsed / 1024 / 1024;
        // The memory usage keeps increasing with the execute function
        console.log(`The script uses approximately ${Math.round(used * 100) / 100} MB`);
    }

}

generateLargeRecordset(10000000).then(()=>{
    process.exit(0);
}); 
sidorares commented 5 years ago

I'd like to investigate where memory usage comes from but .execute() is meant to be for server-side prepared statements, where query part is same for each call but parameters vary. Can you test this code to compare memory usage / performance:

    const insertArray = [];
    for (let p=0; p< 1000; p++) {
        insertArray.push('(?,?,?,?)');
    }
    const query = insertPrefix + insertArray.join(',')

    const valuesArray = [];
    for(let i=0;i<limit;i++){
        [1, 2, 3, 4].forEach(pn => valuesArray.push(`col${pn}${i}`);
        if (valuesArray.length >= 1000) {
           let result = await connection.execute(query, valuesArray); 
        }
    }

there is much more overhead in preparing one query ( sending prepare command, dealing with prepare response, caching and reusing prepare result) so if used incorrectly .execute() can be less performant then .query()

jspatel commented 5 years ago

It makes sense that execute could be expensive. But it doesn't explain why application is not releasing any memory from the application side when there are no binding variables (?) for the parameters, hence there is no need to prepare a statement. I will test it out and post my results.

sidorares commented 5 years ago

e when there are no binding variables (?) for the parameters, hence there is no need to prepare a statement

It's still prepared. It's like compiling a C++ command line utility, even when intent is to start it without command line arguments it worth compiling upfront so it starts faster

enobrev commented 5 years ago

I ran into this same issue due to a similar misunderstanding. I simply assumed execute would be most useful when I didn't care about the response (for an insert / update / delete). I hadn't realized it was explicitly for prepared statement, and more importantly that those prepared statements would stick around long after the query was run.

Once I switched all my rather large queries to use query instead of execute, my script sat comfortably below 32Mb. Before that change, it would skyrocket to 1Gb of memory usage and then die spectacularly.

Galienna commented 4 years ago

Same problem here. I really need the prepared statements as I have to execute the same complex requests for millions of users. So I truly need to call execute, with a different userId as bind param. Using execute, the memory consumption of my script keeps growing, linearly.

Why is that ? Are the prepared statements stored within mysql2 client ?

sidorares commented 4 years ago

@Galienna are you sure it's same PS ( e.i you not building it dynamically for each user )? Could you show it?

Galienna commented 4 years ago

I've just obfuscated field and db names, but here it is :

const mysqlDb = await mysql.createConnection(conf.mysql.connectionOptions);

const [productList] = await mysqlDb.execute(
    'SELECT ' +
    '  p.field_1, ' +
    '  p.field_2, ' +
    '  p.field_3, ' +
    '  p.field_4, ' +
    '  p.field_5, ' +
    '  p.field_6, ' +
    '  p.field_7, ' +
    '  p.field_8 ' +
    'FROM  db_name_1.PRODUCTS p ' +
    'WHERE p.customerId = ? ' +
    'ORDER BY p.some_date ASC ' +
    'LIMIT ? ' +
    'OFFSET ?;',
    [customerId, sqlLimit, sqlOffset]
);

I have several requests just like this one, juste to retrieve user data and migrate them to another db system. Those request are also duplicated as the script walks through the shards. So another statement will be prepared for db_name_2 for instance.

I'm working with limit and offset since the lib doesn't work with cursor, but I don't think it's a problem for PS, am I wrong ?

Also, my migration script is quite complicated, so maybe I'm doing something wrong myself. I'm planning on running the script with query instead of execute, to see if something changes, since other people had no memory problem using query.

By the way, a great thank you for the lib, working with PS and promises is really helpful

sidorares commented 4 years ago

How many field_1 ... field_8 combinations you have? Looks like you are actually creating a lot of PSs and it's not a memory leak. Note that prepared statements are not shared between connections

Galienna commented 4 years ago

Between 15 to 30 fields are retrieved for each request. I have 164 PS. Those are real PS such as the one I showed above. So if everything was OK, I guess the memory consumption should grow till all the 164 PS has been prepared once, then stabilize and vary slightly, In my case, it continues to grow even though all the PS has been executed at least once.

But once again, I think I need to try with query to check if it comes from my script, or from mysql client or one of its lib. I'll come back to you with the result.

EDIT: I lied. It's split into 3 processes. So one container will have one nodejs process, handling 164 PS, not 492.

devilleweppenaar commented 4 years ago

Not sure if this is related to the issue (#1071) we're having but I thought I'd mention it here.

Galienna commented 4 years ago

@sidorares I tested both execute and query and the result is the same. The memory keeps growing. So it's not related to the Prepared Statements, right ? I'm currently isolating other parts of my script to check if it comes from another lib or the script itself.

If I find something involving mysql2 I'll tell you. Anyway thanks for answering my questions :)

aikar commented 4 years ago

Anyone doing what the original reports code is doing, the behavior is expected as the code is flawed. Never build up dynamic queries using .execute, as each query is permanently cached and stored.

@Galienna's code looks correct, but don't see how it could cause this issue.

We are using this library in production serving billions of queries a day without any issue.

stadskle commented 1 year ago

@Galienna did you ever find any reason for this? I know this thread is very old, but I just changed my code from query to execute on a deamon running 24x7 that is running 2 querys about 250K times per hours. I also have the 2 prepared queries and just executing them with different input.

The performance is great, but I see the same linear increase in memory as you describe. And my 2 GB RAM containers are dying every 30 minutes due to memory:

error Command failed with exit code 137.

My code:

array.forEach(function (x){
        dbConnection.execute(
            'UPDATE table SET deleted=? WHERE columnA=? AND columnA=?',
            [height,x.valueA, x.valueB],
            function(err, results, fields) {
                if (err){
                    console.log('Error');
                }
            }
        );
    });
Galienna commented 1 year ago

Sorry mate, it's been a long time and I don't recall how I got around this. As it was for a one time data migration, maybe I accepted some Out Of Memory kill for my containers ^^' But as I wrote in a previous comment : I had this mem issue with both query and execute. Maybe the problem is coming from something else. A circular ref preventing node from freeing mem ? things like that, idk