node-odbc 2.0: Some API changes, big performance gains, and using N-API

wankdanker / node-odbc

ODBC bindings for node

MIT License

174 stars 79 forks source link

node-odbc 2.0: Some API changes, big performance gains, and using N-API #50

Closed markdirish closed 5 years ago

markdirish commented 5 years ago

Changes from node-odbc (from the README in the new branch:

node-odbc has been upgraded from its v 1.x. The following list highlights the major improvements and potential code-breaking

Performance improvements: The underlying ODBC function calls have been reworked to greatly improve performance. For ODBC afficianados, node-odbc used to retrieved results using SQLGetData, which works for small amounts of data but is slow for large datasets. node-odbc now uses SQLBindCol for binding result sets, which for large queries is orders of magnitude faster.

Rewritten with N-API: node-odbc was completely rewritten using node-addon-api, a C++ wrapper for N-API, which created an engine-agnostic and ABI-stable package. This means that if you upgrade your Node.js version, there is no need to recompile the package, it just works!

Database Class Removed: The exported odbc object now contains the functions for connecting to the database and returning an open connection. There is no longer a Database class, and instead calling odbc.open(...) or odbc.openSync(...) will return an open database connection to the requested source with any passed configuration options.

No JavaScript Layer: All of the logic running the package is now done in the C++ code for improved readability, maintainability, and performance. Previously, the JavaScript did not match the API exported by the package as it was wrapped in a JavaScript implementation of the Database class. With the removal of the JavaScript layer,

Minor API Changes: In addition to removing the Database class, minor API improvements have been made. This includes the ability to pass configuration information through a config object to the connection, specify the input/output type of bound parameters, and whether results should be returned as objects or arrays. The complete API documentation can be found in this file.

Tests were created with mocha. Currently in the process of porting at least the sqlite tests, to ensure that the package works on Linux with sqlite.

wankdanker commented 5 years ago

Found a really odd thing. Querying MSSQL via FreeTDS works pretty much fine, except when I try to select data from information_schema.columns (and maybe others). The values are all crazy...

odbc = require('odbc')
conn = odbc.connectSync(process.env.ODBC_CONNECTION_STRING)
console.log(conn.querySync("select top 10 * from information_schema.columns where table_name = 'customer'")

[{ TABLE_CATALOG: '慰楶楬湯楧瑦损浯 \u0000\u0000\u0000!\u0000\u0000\u0000',
    TABLE_SCHEMA: '扤o\u0000',
    TABLE_NAME: '畣瑳浯牥�̒\u0000\u0000',
    COLUMN_NAME: '畣瑳浯牥楟ͤ\u0000\u0000 \u0000\u0000',
    ORDINAL_POSITION: 0,
    COLUMN_DEFAULT: null,
    IS_NULLABLE: 'NO',
    DATA_TYPE: '湩ʹ\u0000',
    CHARACTER_MAXIMUM_LENGTH: null,
    CHARACTER_OCTET_LENGTH: null,
    NUMERIC_PRECISION: '10',
    NUMERIC_PRECISION_RADIX: 10,
    NUMERIC_SCALE: 0,
    DATETIME_PRECISION: null,
    CHARACTER_SET_CATALOG: null,
    CHARACTER_SET_SCHEMA: null,
    CHARACTER_SET_NAME: null,
    COLLATION_CATALOG: null,
    COLLATION_SCHEMA: null,
    COLLATION_NAME: null,
    DOMAIN_CATALOG: null,
    DOMAIN_SCHEMA: null,
    DOMAIN_NAME: null },
  { TABLE_CATALOG: '慰楶楬湯楧瑦损浯 \u0000\u0000\u0000¡\u0000\u0000\u0000',
    TABLE_SCHEMA: '扤ͯ\u0000',
    TABLE_NAME: '畣瑳浯牥̒\u0000\u0000',
    COLUMN_NAME: '慭瑳牥损獵潴敭彲摩\u0000\u0000\u0000¡\u0000\u0000\u0000敮瑸',
    ORDINAL_POSITION: 0,
    COLUMN_DEFAULT: null,
    IS_NULLABLE: 'YES',
    DATA_TYPE: '湩t\u0000',
    CHARACTER_MAXIMUM_LENGTH: null,
    CHARACTER_OCTET_LENGTH: null,
    NUMERIC_PRECISION: '10',
    NUMERIC_PRECISION_RADIX: 10,
    NUMERIC_SCALE: 0,
    DATETIME_PRECISION: null,
    CHARACTER_SET_CATALOG: null,
    CHARACTER_SET_SCHEMA: null,
    CHARACTER_SET_NAME: null,
    COLLATION_CATALOG: null,
    COLLATION_SCHEMA: null,
    COLLATION_NAME: null,
    DOMAIN_CATALOG: null,
    DOMAIN_SCHEMA: null,
    DOMAIN_NAME: null }]

Looks like the fields that are weird are all nvarchars.

Any thoughts on this?

markdirish commented 5 years ago

Definitely curious, do you have UNICODE defined? There might be some assumptions about character encoding that I'm not handling. Is that all the output of the result array? There should be a result.columns property that tells the column name and the data type, it would be interesting to know the data type of those columns that are producing Chinese.

Some of the other columns, like ORDINAL_POSITION, are SMALLINT and aren't handled correctly in this branch I believe. They will default to 0. I have fixed this in another branch (see below).

What is returned if you do the same query, but do console.log(conn.columnsSync(null, null, 'customers', null));? Just curious if that return set is also encoded strange.

I have also been working the last few days on a lot of the inner workings of this package (much more robust checking of types in parameters) and fixed a few errors like this. The latest (and greatest?) changes can be found at https://github.com/markdirish/node-odbc/tree/rewrite-bind.

To connect using the API on that branch, do:

const { Connection } = require('odbc');
async function test() {
  const conn = new Connection(process.env.ODBC_CONNECTION_STRING);
  const result = await conn.query("select top 10 * from information_schema.columns where table_name = 'customer'");
  console.log(result);
}

test();

This will require re-compiling and everything, but I had similar output when calling columns, mainly because the generated table was using datatypes that I hadn't robustly tested.

wankdanker commented 5 years ago

I do not have UNICODE defined. I did try setting that and recompiling. There were a bunch of other issues that popped up including the column names being incorrect in the result records (they were only one character in length).

I left result.columns off of what I posted just for brevity. Here is that:

[ { name: 'TABLE_CATALOG', dataType: -9 },
    { name: 'TABLE_SCHEMA', dataType: -9 },
    { name: 'TABLE_NAME', dataType: -9 },
    { name: 'COLUMN_NAME', dataType: -9 },
    { name: 'ORDINAL_POSITION', dataType: 4 },
    { name: 'COLUMN_DEFAULT', dataType: -9 },
    { name: 'IS_NULLABLE', dataType: 12 },
    { name: 'DATA_TYPE', dataType: -9 },
    { name: 'CHARACTER_MAXIMUM_LENGTH', dataType: 4 },
    { name: 'CHARACTER_OCTET_LENGTH', dataType: 4 },
    { name: 'NUMERIC_PRECISION', dataType: -6 },
    { name: 'NUMERIC_PRECISION_RADIX', dataType: 5 },
    { name: 'NUMERIC_SCALE', dataType: 4 },
    { name: 'DATETIME_PRECISION', dataType: 5 },
    { name: 'CHARACTER_SET_CATALOG', dataType: -9 },
    { name: 'CHARACTER_SET_SCHEMA', dataType: -9 },
    { name: 'CHARACTER_SET_NAME', dataType: -9 },
    { name: 'COLLATION_CATALOG', dataType: -9 },
    { name: 'COLLATION_SCHEMA', dataType: -9 },
    { name: 'COLLATION_NAME', dataType: -9 },
    { name: 'DOMAIN_CATALOG', dataType: -9 },
    { name: 'DOMAIN_SCHEMA', dataType: -9 },
    { name: 'DOMAIN_NAME', dataType: -9 } ]

Running the columnSync() as suggested returns:

[ count: 0,
  columns: [ { name: 'TABLE_CAT', dataType: -9 },
    { name: 'TABLE_SCHEM', dataType: -9 },
    { name: 'TABLE_NAME', dataType: -9 },
    { name: 'COLUMN_NAME', dataType: -9 },
    { name: 'DATA_TYPE', dataType: 5 },
    { name: 'TYPE_NAME', dataType: -9 },
    { name: 'COLUMN_SIZE', dataType: 4 },
    { name: 'BUFFER_LENGTH', dataType: 4 },
    { name: 'DECIMAL_DIGITS', dataType: 5 },
    { name: 'NUM_PREC_RADIX', dataType: 5 },
    { name: 'NULLABLE', dataType: 5 },
    { name: 'REMARKS', dataType: 12 },
    { name: 'COLUMN_DEF', dataType: -9 },
    { name: 'SQL_DATA_TYPE', dataType: 5 },
    { name: 'SQL_DATETIME_SUB', dataType: 5 },
    { name: 'CHAR_OCTET_LENGTH', dataType: 4 },
    { name: 'ORDINAL_POSITION', dataType: 4 },
    { name: 'IS_NULLABLE', dataType: 12 },
    { name: 'SS_DATA_TYPE', dataType: -6 } ] ]

There might be some assumptions about character encoding that I'm not handling.

I know that I made some assumptions about character encoding that were incorrect. After looking at python's odbc library, I realized (if my memory serves correctly) that "UNICODE" processing should really be done conditionally depending on the datatype of the column. Which in the way I was processing column values was not something that was readily available because you have to fetch additional information about the column to determine that.

I'll check out the other branch you mentioned and see how that works.

Thanks!

wankdanker commented 5 years ago

This is from the rewrite-bind branch:


console.log(await connection.query("select top 1 * from information_schema.columns where column_name = 'customer'"));

[ { TABLE_CATALOG: '慰楶楬湯楧瑦损浯 \u0000\u0000\u00005\u0000\u0000\u0000',
    TABLE_SCHEMA: '扤籯翄',
    TABLE_NAME:
     '䝐彃灏湥佟摲牥彳潂數彳慃慮慤\u0000\u0000\u0000\u0000\u0000\u0000%\u0000\u0000\u0000畃瑳浯牥',
    COLUMN_NAME: '畃瑳浯牥簀翄\u0000',
    ORDINAL_POSITION: 3,
    COLUMN_DEFAULT: null,
    IS_NULLABLE: 'YES',
    DATA_TYPE: '慶捲慨r簀翄',
    CHARACTER_MAXIMUM_LENGTH: 25,
    CHARACTER_OCTET_LENGTH: 25,
    NUMERIC_PRECISION: null,
    NUMERIC_PRECISION_RADIX: null,
    NUMERIC_SCALE: null,
    DATETIME_PRECISION: null,
    CHARACTER_SET_CATALOG: null,
    CHARACTER_SET_SCHEMA: null,
    CHARACTER_SET_NAME: '獩彯缱\u0000',
    COLLATION_CATALOG: null,
    COLLATION_SCHEMA: null,
    COLLATION_NAME:
     '兓彌慌楴ㅮ䝟湥牥污䍟ㅐ䍟彉十\u0000\u0000\u0000\u0000\u0000\u0000%\u0000\u0000\u0000｠簀翄\u0000',
    DOMAIN_CATALOG: null,
    DOMAIN_SCHEMA: null,
    DOMAIN_NAME: null },
  count: -1,
  columns: [ { name: 'TABLE_CATALOG', dataType: -9 },
    { name: 'TABLE_SCHEMA', dataType: -9 },
    { name: 'TABLE_NAME', dataType: -9 },
    { name: 'COLUMN_NAME', dataType: -9 },
    { name: 'ORDINAL_POSITION', dataType: 4 },
    { name: 'COLUMN_DEFAULT', dataType: -9 },
    { name: 'IS_NULLABLE', dataType: 12 },
    { name: 'DATA_TYPE', dataType: -9 },
    { name: 'CHARACTER_MAXIMUM_LENGTH', dataType: 4 },
    { name: 'CHARACTER_OCTET_LENGTH', dataType: 4 },
    { name: 'NUMERIC_PRECISION', dataType: -6 },
    { name: 'NUMERIC_PRECISION_RADIX', dataType: 5 },
    { name: 'NUMERIC_SCALE', dataType: 4 },
    { name: 'DATETIME_PRECISION', dataType: 5 },
    { name: 'CHARACTER_SET_CATALOG', dataType: -9 },
    { name: 'CHARACTER_SET_SCHEMA', dataType: -9 },
    { name: 'CHARACTER_SET_NAME', dataType: -9 },
    { name: 'COLLATION_CATALOG', dataType: -9 },
    { name: 'COLLATION_SCHEMA', dataType: -9 },
    { name: 'COLLATION_NAME', dataType: -9 },
    { name: 'DOMAIN_CATALOG', dataType: -9 },
    { name: 'DOMAIN_SCHEMA', dataType: -9 },
    { name: 'DOMAIN_NAME', dataType: -9 } ] ]

console.log(await connection.columns(null, null, 'customers', null));

[ count: 0,
  columns: [ { name: 'TABLE_CAT', dataType: -9 },
    { name: 'TABLE_SCHEM', dataType: -9 },
    { name: 'TABLE_NAME', dataType: -9 },
    { name: 'COLUMN_NAME', dataType: -9 },
    { name: 'DATA_TYPE', dataType: 5 },
    { name: 'TYPE_NAME', dataType: -9 },
    { name: 'COLUMN_SIZE', dataType: 4 },
    { name: 'BUFFER_LENGTH', dataType: 4 },
    { name: 'DECIMAL_DIGITS', dataType: 5 },
    { name: 'NUM_PREC_RADIX', dataType: 5 },
    { name: 'NULLABLE', dataType: 5 },
    { name: 'REMARKS', dataType: 12 },
    { name: 'COLUMN_DEF', dataType: -9 },
    { name: 'SQL_DATA_TYPE', dataType: 5 },
    { name: 'SQL_DATETIME_SUB', dataType: 5 },
    { name: 'CHAR_OCTET_LENGTH', dataType: 4 },
    { name: 'ORDINAL_POSITION', dataType: 4 },
    { name: 'IS_NULLABLE', dataType: 12 },
    { name: 'SS_DATA_TYPE', dataType: -6 } ] ]

markdirish commented 5 years ago

Data Type -9 is SQL_WVARCHAR, as seen in sqlucode.h. So for some reason, ODBC thinks the the table is trying to return UTF-16 strings and (correctly) showing you the UTF-16 values. Are you sure there isn't some setting set for utf-16 on your MSSQL server? The -9 data type code is returned directly from SQLDescribeCol, so I'm not sure its an issue with the node.js package... does it work fine on the 1.x version?

wankdanker commented 5 years ago

Are you sure there isn't some setting set for utf-16 on your MSSQL server?

Will need to look into that. But it does work correctly in v1.x. But, that defaults to having the UNICODE flag set. So, it may be doing something slightly different when getting the value.

wankdanker commented 5 years ago

In ProcessDataForNapi, this fixes my issue:

          case SQL_WCHAR :
          case SQL_WVARCHAR :
          case SQL_WLONGVARCHAR :
            value = Napi::String::New(env, (const char*)storedRow[j].data, storedRow[j].size);
            break;

... that is just (const char*), not (const char16_t*).

wankdanker commented 5 years ago

If the code I mentioned previously is left with (const char16_t *) and BindColumns() is modified to set targetType to SQL_C_WCHAR for the SQL_WCHAR and SQL_WVARCHAR cases, then things are almost OK. It's just that there is a bunch of garbage after the value.

Using isql displays these values correctly without having to modify any settings. isql just uses SQL_C_CHAR for every call to GetData(), no matter the data type as far as I can tell.

https://github.com/lurcher/unixODBC/blob/master/exe/isql.c#L1351

markdirish commented 5 years ago

I think I know the issue, when I wrote the bindColumns function I wasn't as versed in how the C types work, nor did I know all the fun ODBC defines I could utilize. I can take a look tomorrow, probably tear out some chunks and rewrite them, and push something new.

Thanks for your help in finding these issues!

wankdanker commented 5 years ago

Happy to help. The code looks a lot cleaner. Can't wait to see where it goes!

Thanks for all your help!

markdirish commented 5 years ago

Ok, I think I have this fixed on my rewrite-bind branch. The issue was 2-fold:

As you suggested, targetType should be bound to SQL_C_WCHAR for SQL_WCHAR, SQL_WVARCHAR, and SQL_WLONGVARCHAR. But as you noted, there is a bunch of garbage after the value.

When SQLFetch is called, it was storing the StrLeng_or_PtrInd (which is in bytes) as the storedRow's size. Then, when converting the stored data to a Napi::String, calling Napi::String::New, I would call the function version that allows specifying the length with that value:

static String New(
  napi_env env,          ///< N-API environment
  const char16_t* value, ///< UTF-16 encoded C string (not necessarily null-terminated)
  size_t length          ///< Length of the string in 2-byte code units
);

But as it notes, size_t length is in 2-byte code units. So I was specifying twice the string length than was actually present, resulting in a bunch of garbage data at the end of the string.

I'm hopeful it should work now, but I haven't tested it yet, still figuring how to set up a SQL_WVARCHAR column in Db2.