ibmdb / node-ibm_db

IBM DB2 and IBM Informix bindings for node
MIT License
188 stars 151 forks source link

Fetch multiple rows #977

Open rodrigolive opened 5 months ago

rodrigolive commented 5 months ago

Using the ODBC (node-odbc) driver I'm able to fetch more than one row at a time with stmt.fetch() using fetchSize from a DB2 iAccess Connect driver. How can I accomplish the same with node-ibm_db directly, without using the ODBC driver?

This is the ODBC driver code I'm looking to reproduce directly using the ibm_db driver:

const cursor = await db.query(sql, params, {
     cursor: true,
     fetchSize: 1000
});

let rows = await cursor.fetch();  //  fetches 1000 rows
rows = await cursor.fetch();  //  fetches next 1000 rows
// ... until cursor.noData === true

The motivation is that fetchAll() is unusable for large tables and stmt.fetch() for single rows is excruciatingly slow for large tables (ie. 20M rows). It makes the driver unusable for real-world situations for dealing with large table processing. This feature is available for the JDBC DB2 driver equivalent OTOH.

bimalkjha commented 5 months ago

@rodrigolive fetchAll() is not unusable. If application want to fetch only 1000 rows, they need to specify that in SQL using FETCH FIRST 1000 ROWS ONLY. In this way fetchAll() will return only 1000 rows. Thanks.

rodrigolive commented 5 months ago

The application does not need to fetch 1000 rows. It needs 20M rows in batches of 1000 to avoid OOM errors and reduced network round trips from 20M fetch()es. Right now, at a site with a very fast network and production-level fine-tuned DB2 database it takes 17 hours to download ~20M rows with ibm_db, which is drastically reduced to <10 minutes when using ODBC directly.

If one fetches 20M rows (or 100K for that matter) with fetchAll(), it will load all of them in memory in a single Nan::Set(array, ...) which is unusable as far as RAM goes, and fetch() individual rows will take exceedingly long due to extenuating latency for each fetch. The IBM Connect drivers support the ODBC idiom fetchSize, as the JDBC DB2 driver does.

To request more rows in a batch, one needs to set the stmt.setAttr( ibmdb.SQL_ATTR_ROW_ARRAY_SIZE, 1000), which works, the underneath ODBC fetches rows in batches of 1000 - which is what we would need - but unfortunately the driver code ignores the remaining rows and only returns the first row in the batch, which is also called "rowset". It would need to be able, internally, to repeat the SQLFetch() if more rows are available in the same batch or use the more sophisticated SQLFetchScroll().

const sql = "SELECT rownum FROM myhugetable";
const stmt = await db.prepare(sql);
await stmt.setAttr(ibmdb.SQL_ATTR_ROW_ARRAY_SIZE, 1000);
const cursor = await stmt.execute(); 
let row;
row = await cursor.fetch();
console.log( row.ROWNUM ); // rownum == 1
row = await cursor.fetch();
console.log( row.ROWNUM ); // rownum == 1000

If you try the above code the remaining 999 rows for each batch is swallowed up by the driver. Only the first is returned.