mridoni / gixsql

GixSQL is an ESQL preprocessor and a series of runtime libraries to enable GnuCOBOL to access PostgreSQL, ODBC, MySQL, Oracle and SQLite databases.
GNU General Public License v3.0
16 stars 8 forks source link

gixsql (pgsql) performance analysis / comparision #10

Open GitMensch opened 2 years ago

GitMensch commented 2 years ago

This issue will be a collection of information I've found (and find) when doing some performance comparisons and analysis. A quick pref record showed the current log implementation to be an issue (see mridoni/gixsql#15), so this was disabled in the runs specified below.

For the (later) comparison the same code was compiled and run against the same sources with the same PostgreSQL database. One time with GixSQL, the other with OCESQL 1.2 (heavily patched). The COBOL code uses full debugging information, full runtime checks and no "C" optimization so the COBOL part takes most of the "total execution time"; with "optimized" applications the COBOL part possibly will take less than the half execution time. The data was produced with full debugging enabled (-ggdb3), but optimizations were unchanged (so I assume -O2), run via valgrind --tool=callgrind --dump-before=cob_terminate_routiones (so the actual calls and times, both "inclusive" [in the tables below "from this function"] and "self" are visible).

Overview GixSQL:

The sample data is run with 47,548 "general db calls" (from an io-module which does multiple EXEC SQL), only reading data.

function times called % exec-time incl. % exec-time self
GIXSQLExecSelectIntoOne (libgixsql.so: gixsql.cpp) 47,544 40.5 0.8
GIXSQLCursorFetchOne (libgixsql.so: gixsql.cpp) 47,512 18.6 0.3
GIXSQLSetResultParams (libgixsql.so: gixsql.cpp) 522,752 9.4 0.5
GIXSQLEndSQL (libgixsql.so: gixsql.cpp) 95,059 6.5 0.03
GIXSQLSetSQLParams (libgixsql.so: gixsql.cpp) 47,574 0.9 0.1
GIXSQLStartSQL (libgixsql.so: gixsql.cpp) 95,059 nA nA

GIXSQLExecSelectIntoOne (total 13% of application time)

function times called % exec-time incl. % exec-time self
_gixsqlExecParams(Connection*, sqlca_t*, char*, unsigned int) (libgixsql.so: gixsql.cpp) 47,606 71.3 0.8
DbInterfacePGSQL::get_resultset_value (libgixsql-pgsql.so: DbInterfacePGSQL.cpp) 428,262 7 1.8
get_hostref_or_literal (libgixsql.so: gixsql.cpp) 47,618 4 0.2
SqlVarList::getMaxLength() (libgixsql.so: SqlVarList.cpp) 47,533 3.6 0.9

_gixsqlExecParams (total 9% of application time)

function times called % exec-time incl. % exec-time self
DbInterfacePGSQL::exec_params (libgixsql-pgsql.so: DbInterfacePGSQL.cpp) 47,608 56.6 0.6
is_dml_statement (libgixsql.so: utils.cpp) 47,608 8 0.5
is_commit_or_rollback_statement (libgixsql.so: utils.cpp) 47,608 8 0.2

Analysis: if possible, the check for commit/rollback/dml should be done at preprocessing time. Where not the statement type should be added on the first call and only be re-checked if the statement changes.

@mridoni is this possible (maybe later on)?

Calls to trim_copy() used in utils.cpp should be replaced by calling trim(), because that's what it does.

DbInterfacePGSQL::get_resultset_value (total 1.1% of application time)

function times called % exec-time incl. % exec-time self
__strcpy_sse3 (libc-2.17.so) 523,300 27.2 27.2
__strlen_sse3 (libc-2.17.so) 1046,600 18.8 18.8
PQgetvalue (libpq.so.5.12) 523,300 14.9 8.1
PQftype (libpq.so.5.12) 523,300 13.2 8.5

Analysis: Does some unnecessary C string handling, see mridoni/gix#85 for a minor PR to fix that. Additional it checks the type each time (would be reasonable to check for optimizations, but as the complete function has only 1.1% of the total execution time, this isn't worth it).

GIXSQLCursorFetchOne (total 6% of application time)

function times called % exec-time incl. % exec-time self
DbInterfacePGSQL::fetch_one (libgixsql-pgsql.so: DbInterfacePGSQL.cpp) 47,520 77 1
CursorManager::get (libgixsql.so: CursorManager.cpp) 47,520 8.6 0.3

DbInterfacePGSQL::fetch_one (total 5% of application time)

function times called % exec-time incl. % exec-time self
DbInterfacePGSQL::exec (libgixsql-pgsql.so: DbInterfacePGSQL.cpp) 47,520 74 0.9
sprintf (libc-2.17.so) 47,520 15 0.2

Those sprintf can be dropped (along with all the extra handling), see mridoni/gix#86 for a minor PR to fix that.

GIXSQLSetResultParams (total 3% of application time)

function times called % exec-time incl. % exec-time self
SqlVarList::AddVar() (libgixsql.so: SqlVarList.cpp) 523,432 95 6.7

SqlVarList::AddVar (total 3% of application time)

function times called % exec-time incl. % exec-time self
SqlVar::SqlVar() (libgixsql.so: SqlVar.cpp) 571,259 42.7 8.1
operator new (libstdc++-.so.6.0.24) 571,259 23.2 2.1

Analysis: may only be reduced if (not sure if that's possible):

@mridoni is it possible / reasonable to re-use variables instead of creating and deleting those? While that's only ~5% of total execution time, that's quite a lot objects being used.

GIXSQLEndSQL (total 2.3% of application time)

function times called % exec-time incl. % exec-time self
SqlVarList::clear() (libgixsql.so: SqlVarList.cpp) 190,302 99.5 14

SqlVarList::clear (total 2.3% of application time)

function times called % exec-time incl. % exec-time self
SqlVar::~SqlVar() (libgixsql.so: SqlVar.cpp) 571,259 24 3.5
operator delete (libstdc++-.so.6.0.24) 571,259 20.3 0.4

Analysis: only implicitly reducing possible, would be the case if we can reduce the amount of calls to SqlVarList::AddVar().

mridoni commented 2 years ago

Great work, thanks

GitMensch commented 2 years ago

When looking at fetch_one and cursor_open I wonder if there's a memory leak.

What happens with the memory now? Is true_query now the actual data (= internal buffer) or is that created from a copy of the data? If it is the first one: does the destructor of the string (because that object goes out of the scope when leaving the function) also free()s the initial string, here true_query? If not I guess there's a memory leak (will be solved by my PR because I'll drop that cstrings completely here).

I can confess that I'm not very practiced in C++ and think I can learn something on the way.

mridoni commented 2 years ago

When looking at fetch_one and cursor_open I wonder if there's a memory leak.

I agree that that part needs to be rewritten: it uses a mix of C and C++ and that is probably not a good thing to do. It started this way because in the beginning there was a lot of C code ported over (or modified, as in this case) from ocesql that, unfortunately, remained there.

GitMensch commented 2 years ago

I agree there are likely more parts and I assumed that history to be the reason (I'll send a PR for those two, nearly finished). If you know about my question and the potential memory leak I'd still like to learn about this.

mridoni commented 2 years ago

If you know about my question and the potential memory leak I'd still like to learn about this.

The std: stringconstructor that takes a char * as input makes a copy of the null-terminated data in its own buffer, the original buffer is neither referenced nor freed when the C++ string is destructed, so if it wasn’t freed (as unfortunately true_query wasn’t) that’s a memory leak.

GitMensch commented 2 years ago

Thanks for the information, I have no good C++ reference at hand (mostly class and method references), if you can suggest one, please do so.

Fortunately we have a fix for those memory issues with mridoni/gix#86. I need to do some other profiling work (and possibly adjustments to GnuCOBOL) but would be interested in re-doing the check with some other test scenarios (then also including writes to the DB, which wasn't the case yet).

The testing of GixSQL and its psql driver made quite good progress, mridoni/gixsql#3 is currently the single part that is missing to move the tests to "pre-production" state [when applying the known patches].

Later on I'd prefer the utils functions part be moved from _gixsqlExecParams to compile-time (SQLCA should have enough fields and the statements could also be adjusted to contain a trailing type character, maybe even with a low-value in between if this helps for backward-compatibility [I'd recompile everything in any case]); but I see that the next updates are likely to need some weeks, which is of course ok - and that's actually a bigger change, too.