taocpp / taopq

C++ client library for PostgreSQL
Boost Software License 1.0
265 stars 40 forks source link

Iterative bind (for an insert with dynamic number of columns) #71

Open emmenlau opened 9 months ago

emmenlau commented 9 months ago

I have a use case where I'd like to store tables with a dynamic number of columns. The use case is from analysis measurements, where users can configure the actually computed measurements with a simple on/off selection. I receive the measurements as a vector of vectors which represent the columns.

It seems not directly possible to use the parameter binding with such dynamic data, or is it? In my naiive understanding it seems I could loop over the columns and bind value by value, iteratively. But taopq seems not to support this.

Am I even on the right track? How to work with a dynamic set of columns?

d-frey commented 9 months ago

This is not supported currently, as the number of columns for each data type needs to be fixed.

That said, what would the statement itself look like? Or do you generate a new statement each time? Can you provide a more specific example?

emmenlau commented 9 months ago

Thanks a lot @d-frey for your consideration, its appreciated!!

A pseudo-code example would be something along those lines:

    std::vector<std::vector<std::variant<uint64_t, double, std::string>>> vDataTable;
    std::vector<std::string> vDataTableHeader;
    const size_t vDataTableRows = vDataTable.size();
    const size_t vDataTableCols = vDataTable.front().size();

    const std::string vInsertStatement =
        "INSERT INTO TestTable (" +
        mystring::join(vDataTableRows, ", ") +
        ") VALUES(" +
        mystring::generateIntegerSequence(vDataTableCols) +
        ");";

    const auto conn = tao::pq::connection::create( "dbname=template1" );
    const auto statement = tao::pq::prepare( conn, vInsertStatement );

    for(size_t vRowIdx = 0; vRowIdx < vDataTableRows; ++vRowIdx) {
        for(size_t vColIdx = 0; vColIdx < vDataTableCols; ++vColIdx) {
            // Iterative binding of the n-th parameter:
            statement.bind(vColIdx, vDataTable[vRowIdx][vColIdx]);
        }
        statement.execute();
        statement.clear_binding();
    }

I know that for example SQLiteCpp supports iterative binding in their Statement class, see for example https://github.com/SRombauts/SQLiteCpp/blob/master/include/SQLiteCpp/Statement.h#L128.

Admittedly I'm not sure about performance or safety considerations, though! However I think performance is not the critical aspect. It would be just awesome if runtime-sized statements could be supported.

PS: My pseudo-code is missing a number of aspects like setting the data in vDataTable and vDataTableHeader, or using std::visit for the std::variant, or the definition of mystring::generateIntegerSequence(), let me know if I should improve...

d-frey commented 9 months ago

The main issue I see is lifetime and temporaries, there's a much larger potential for errors. Or you need to copy everything which makes things inefficient, up to a point where I'm no longer comfortable with it. Also, the libpq API requires arrays, and I'd like to avoid dynamic allocations of those (or fixed limits).

Don't get me wrong, I'll look into it and see what I can do, but I see quite a few challenges...

d-frey commented 9 months ago

I added some experimental support for parameter<>. See src/test/pq/parameter.cpp for a small example. parameter<Max> can hold a maximum of Max parameters for a statement. Note that a single bind() can add multiple parameters. Also, a single argument to bind() can decay into multiple values for the database. If you call bind(), the values you bind must remain valid and unmodified until you no longer use the parameter<> object. Anyway, please have a look and let me know what you think.

emmenlau commented 9 months ago

Awesome work @d-frey , and thanks a lot for this quick consideration! Due to the holidays I may not have time to look into it this week, but I'll give it a try before the new year. Looking forward!

d-frey commented 9 months ago

I put some more work into it, temporaries are now moved into a holder if necessary and there are some optimizations to skip the temporaries if the underlying data does not reference the original value. Anyway, I'll continue to improve it and then add some documentation when I get to the end.

emmenlau commented 9 months ago

Awesome!