php / pecl-database-ibm_db2

Extension for IBM DB2 Universal Database, IBM Cloudscape, and Apache Derby
http://pecl.php.net/package/ibm_db2
Apache License 2.0
23 stars 27 forks source link

Bug #76322: Strings containing NULL-byte get truncated #14

Closed morozov closed 3 years ago

morozov commented 3 years ago

Tested locally with ibmcom/db2:11.5.6.0:

$ make test TESTS=tests/test_76322_fetch_null_byte.phpt

Build complete.
Don't forget to run 'make test'.

=====================================================================
PHP         : /usr/bin/php 
PHP_SAPI    : cli
PHP_VERSION : 8.0.9
ZEND_VERSION: 4.0.9
PHP_OS      : Linux - Linux viola 5.13.8-arch1-1 #1 SMP PREEMPT Wed, 04 Aug 2021 16:57:44 +0000 x86_64
INI actual  : /home/morozov/Projects/pecl-database-ibm_db2/tmp-php.ini
More .INIs  :   
CWD         : /home/morozov/Projects/pecl-database-ibm_db2
Extra dirs  : 
VALGRIND    : Not used
=====================================================================
Running selected tests.
PASS IBM-DB2: PECL bug 76322 -- strings containing null-byte get truncated [tests/test_76322_fetch_null_byte.phpt] 
=====================================================================
Number of tests :    1                 1
Tests skipped   :    0 (  0.0%) --------
Tests warned    :    0 (  0.0%) (  0.0%)
Tests failed    :    0 (  0.0%) (  0.0%)
Tests passed    :    1 (100.0%) (100.0%)
---------------------------------------------------------------------
Time taken      :    5 seconds
=====================================================================
NattyNarwhal commented 3 years ago

I talked to @tessus and @kadler and we're not sure this might not be the correct fix. When I looked at the original report, I noticed the type you had was CHAR FOR BIT DATA. Our suspicion is SQL/CLI treats this as a string (which is null terminated, AFAIK, so truncating with strlen is appropriate), and not a binary (which absolutely can have nulls).

At least on IBM i (I can't say for LUW, tessus would be better at that), FOR BIT DATA implies CCSID 65535 (raw). Thus, kadler implemented a workaround for FOR BIT DATA to mark them as SQL_BINARY instead of SQL_CHAR in the Python driver. I think backporting this to the PHP driver might be the better good alternative, assuming it works for LUW too.

tessus commented 3 years ago

assuming it works for LUW too

It should. FOR BIT DATA should be bound to BINARY by default anyway. My current dev box is shot, but I wanted to setup a new VM at one point. I hope I'll get to it soon, althought I can't promise anything.

kadler commented 3 years ago

The test code uses the query SELECT X'410042' V FROM sysibm.sysdummy1, which would not be considered a CHAR() FOR BIT DATA, but a regular VARCHAR in JOB CCSID. So the code I wrote wouldn't apply.

AFAIK, there is no way easy way to deal with this when using IBM i CLI, since it erroneously returns SQL_NTS when null-terminated string support is enabled (IBM i CLI has a tenuous grasp of the standard AFAICT). Thus, the caller has no clue how much data was actually returned. I think the only way to handle this would be to memset the data to a non-null value and then search from the end for the null byte added by CLI.

morozov commented 3 years ago

@kadler it shouldn't be a problem. I implemented the test using SELECT X'410042' V FROM sysibm.sysdummy1 for the sake of simplicity. The actual problem is about fetching stored data. So if you can fix it the right way based on the underlying data type, the test can be reworked too (in a way that the issue is originally documented).

kadler commented 3 years ago

Ahh, I didn't see the original report. SELECT CAST(X'410042' as VARCHAR(10) FOR BIT DATA) V FROM sysibm.sysdummy1 should do it, or just VALUES(CAST(X'410042' as VARCHAR(10) FOR BIT DATA)). (Substitute whatever length desired in the cast.)

morozov commented 3 years ago

Is SQL_DESC_COLUMN_CCSID used to determine the CCSID of the column available in the LUW client library? I don't see it defined in the include/sqlcli.h shipped as part the LUW driver, and the library for python uses it with only with IBM i (as mentioned above).

kadler commented 3 years ago

I believe it's an IBM i-only thing. I'm not sure if there's an equivalent way to do it using LUW libraries.

morozov commented 3 years ago

I believe it's an IBM i-only thing. I'm not sure if there's an equivalent way to do it using LUW libraries.

In this case, how do we proceed here? I believe this suggestion is no longer valid:

I think backporting this to the PHP driver might be the better good alternative, assuming it works for LUW too.