oracle / odpi

ODPI-C: Oracle Database Programming Interface for Drivers and Applications
https://oracle.github.io/odpi/
Other
264 stars 75 forks source link

Unexpected error messages with dpiLob_readBytes #146

Closed laurenz closed 3 years ago

laurenz commented 4 years ago

I am using ODPI-C 4.0.1, running on 64-bit Ubuntu 20.04 LTS.

The compiler is gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0.

I am using Oracle Instant Client 19.8, installed from RPMs (via alien). I get the same behaviour with version 18.5

The database reports version

Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

LD_LIBRARY_PATH=/home/hs/intelligent_migrator/lib:/usr/lib/oracle/19.8/client64, where the first is the location of libodpic.so.4, and the second is where Instant Client lives.

These are the Oracle environment variables used:

Problem description:

I am essentially running SELECT * FROM atable for a table with a CLOB column. Selecting the data with sqlplus from the same machine works fine.

The sequence of calls is (in pseudocode with the actual C calls):

dpiConn_prepareStmt(*ora_conn, 0, txt_srcquery, strlen(txt_srcquery), NULL, 0, &stmt);
dpiStmt_setFetchArraySize(stmt, 200);
dpiStmt_execute(stmt, 0, &numQueryColumns);
// for i from 1 to numQueryColumns
    dpiStmt_getQueryInfo(stmt, i, &info);
// endless loop
    dpiStmt_fetch(stmt, &found, &bufferRowIndex);
    // exit loop if !found
    // for i from 1 to numQueryColumns
        dpiStmt_getQueryValue(stmt, i, &nativeTypeNum, &colValue);
        // if it is a CLOB column
            dpiLob_getSize(colValue->value.asLOB, &lobsize);
            dpiLob_readBytes(colValue->value.asLOB, 1, lobsize, binstr, &binlength);

where binlength is initialized to lobsize and binstr is allocated to 4 * lobsize + 1 to account for multi-byte characters.

This works fine most of the time, but for all CLOBs with a length of 3 (?!) I get:

gerror_info.fnName: dpiLob_readBytes
gerror_info.action: read from LOB
gerror_info.message: DPI-1062: unexpected OCI return value 99 in function dpiLob_readBytes

For a CLOB with a non-ASCII string (), I get:

gerror_info.fnName: dpiLob_readBytes
gerror_info.action: read from LOB
gerror_info.message: ORA-24812: character set conversion to or from UCS2 failed

The first message is as surprising for me as it seems to be for ODPI, because I see that you call OCILobRead2 with DPI_OCI_ONE_PIECE, which according to the Oracle documentation should never return OCI_NEED_DATA. Why that should happen with CLOBs of size 3 is beyond me.

I also cannot explain the second error message, because I am using AL32UTF8 as database encoding and as client encoding. Moreover, I am used to Oracle using “replacement characters” in such a case.

I can come up with a short stand-alone program if that helps.

anthony-tuininga commented 4 years ago

Yes, a short standalone program would definitely help. Please include the table creation and population scripts as the data may be critical. If this is consistent, hopefully I can replicate it, too!

laurenz commented 4 years ago

Ok, here is the reproducer. I am running this on Fedora 32 Linux with gcc 10.2.1 with Oracle Express Edition 18.4, but the behaviour is the same.

In Oracle:

CREATE TABLE clobtest (pk number(5) CONSTRAINT clobtest_pkey PRIMARY KEY, string clob);
INSERT INTO clobtest VALUES (1, '123');
COMMIT;

The C code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include "dpi.h"

#define CONNECTSTRING ""  /* bequeath */
#define USERNAME "laurenz"
#define PASSWORD "apassword0"
#define STATEMENT "SELECT pk, string FROM clobtest"

static void
display_oracle_error(dpiContext *gcontext)
{
    dpiErrorInfo gerror_info;

    if  (gcontext)
    dpiContext_getError(gcontext, &gerror_info);

    fprintf(stderr, "[FAILED]\n");
    fprintf(stderr, "    FN: %s\n", gerror_info.fnName);
    fprintf(stderr, "    ACTION: %s\n", gerror_info.action);
    fprintf(stderr, "    MSG: %.*s\n", gerror_info.messageLength, gerror_info.message);

    exit(1);
}

int main(int argc, char **argv)
{
    dpiErrorInfo gerror_info;
    dpiContext *gcontext;
    dpiConn *conn;
    dpiStmt *stmt;
    dpiData *colValue;
    dpiNativeTypeNum nativeTypeNum;
    uint32_t numQueryColumns, bufferRowIndex;
    int i, found = 1;
    char *str;
    uint64_t lobsize, strlength;

    if (dpiContext_create(DPI_MAJOR_VERSION, DPI_MINOR_VERSION, &gcontext, &gerror_info) != DPI_SUCCESS)
    {
        fprintf(stderr, "[FAILED]\n");
        fprintf(stderr, "    FN: %s\n", gerror_info.fnName);
        fprintf(stderr, "    ACTION: %s\n", gerror_info.action);
        fprintf(stderr, "    MSG: %.*s\n", gerror_info.messageLength, gerror_info.message);

        return 1;
    }

    if (dpiConn_create(gcontext,
            USERNAME, strlen(USERNAME),
            PASSWORD, strlen(PASSWORD),
            CONNECTSTRING, strlen(CONNECTSTRING),
            NULL, NULL, &conn) != DPI_SUCCESS)
        display_oracle_error(gcontext);

    if (dpiConn_prepareStmt(conn, 0, STATEMENT, strlen(STATEMENT), NULL, 0, &stmt) != DPI_SUCCESS)
        display_oracle_error(gcontext);

    if (dpiStmt_setFetchArraySize(stmt, 200) != DPI_SUCCESS)
        display_oracle_error(gcontext);

    if (dpiStmt_execute(stmt, 0, &numQueryColumns) != DPI_SUCCESS)
        display_oracle_error(gcontext);

    while (1)
    {
        if (dpiStmt_fetch(stmt, &found, &bufferRowIndex) != DPI_SUCCESS)
            display_oracle_error(gcontext);

        if (! found)
            break;

        printf("\n");

        for (i = 1; i <= numQueryColumns; ++i)
        {
            if (dpiStmt_getQueryValue(stmt, i, &nativeTypeNum, &colValue) != DPI_SUCCESS)
                display_oracle_error(gcontext);

            if (colValue->isNull)
            {
                printf("column %d IS NULL\n", i);
                continue;
            }

            switch (nativeTypeNum)
            {
                case DPI_NATIVE_TYPE_INT64:
                    printf("column %d = %ld\n", i, colValue->value.asInt64);
                    break;
                case DPI_NATIVE_TYPE_UINT64:
                    printf("column %d = %lu\n", i, colValue->value.asUint64);
                    break;
                case DPI_NATIVE_TYPE_FLOAT:
                    printf("column %d = %f\n", i, colValue->value.asFloat);
                    break;
                case DPI_NATIVE_TYPE_DOUBLE:
                    printf("column %d = %f\n", i, colValue->value.asDouble);
                    break;
                case DPI_NATIVE_TYPE_LOB:
                    if (dpiLob_getSize(colValue->value.asLOB, &lobsize) != DPI_SUCCESS)
                        display_oracle_error(gcontext);

                    strlength = lobsize;

                    if ((str = malloc(lobsize * 4 + 1)) == NULL)
                    {
                        fprintf(stderr, "out of memory allocating CLOB string");
                        return 1;
                    }

                    if (dpiLob_readBytes(colValue->value.asLOB, 1, lobsize, str, &strlength) != DPI_SUCCESS)
                        display_oracle_error(gcontext);

                    printf("column %d = \"%s\"\n", i, str);

                    free(str);

                    break;
                default:
                    fprintf(stderr, "unexpected data type in column %d", i);
                    return 1;
            }
        }
    }

    if (dpiStmt_release(stmt) != DPI_SUCCESS)
        display_oracle_error(gcontext);

    if (dpiConn_release(conn) != DPI_SUCCESS)
        display_oracle_error(gcontext);

    return 0;
}

I build with:

gcc -Wall -I /home/laurenz/odpi/include -L /home/laurenz/odpi/lib -Wl,-rpath,/home/laurenz/odpi/lib -o repro repro.c -l odpic

This is the result of running the code:

$ ./repro

column 1 = 1
[FAILED]
    FN: dpiLob_readBytes
    ACTION: read from LOB
    MSG: DPI-1062: unexpected OCI return value 99 in function dpiLob_readBytes
anthony-tuininga commented 3 years ago

Interesting! For reasons that the Oracle Client libraries would have to provide, you MUST provide a buffer equivalent to * 4 (or larger). If you do not provide a buffer of at least size 4 you get the error OCI_NEED_DATA (99)! In your code you were telling OCI that you had a buffer of size 3. Even though that is technically quite acceptable it insists on having a minimum buffer of size 12. Anything less than that truncates the result.

In your example above you have strlength = lobsize; when it should be strlength = lobsize * 4;. Note as well that the result is NOT NULL-terminated. The provided length is returned and you need to either NULL terminate it yourself or simply use this: printf("column %d = \"%.*s\"\n", i, strlength, str);

anthony-tuininga commented 3 years ago

I've just pushed a small change to ensure that you don't get that unexpected error. Instead you will simply get back the fact that you received no characters/bytes. This is still not what you might want but its at least reasonable. :-)

laurenz commented 3 years ago

Thanks for having a closer look.

Yes, it should have been

strlength = lobsize * 4 + 1;

That was my sloppiness, but I used only ASCII characters anyway. I also forgot about the missing terminating '\0'.

I cannot say I am happy with your workaround, though. That would mean that you could get an empty result and mistakenly believe that there is an empty CLOB when really there isn't.

I think the proper fix would be to document that valueLength has to be at least four times the size of the CLOB, and convert OCI_NEED_DATA into a reasonable error message (something like “result buffer is too short”).

anthony-tuininga commented 3 years ago

I cannot say I am happy with your workaround, though. That would mean that you could get an empty result and mistakenly believe that there is an empty CLOB when really there isn't.

Well, you can't use the length returned as the definitive answer anyway. You can probably assume that if you are using UTF-8 and you supply a buffer that is 4 times the size of the number of characters you desire and that that number of characters exist in the LOB that you will get them; in all other cases you will get less. For a proper solution you need to be looking at the output of dpiLob_getSize() and repeating calls until you get the size you desire -- or ensuring the buffer size is large enough to begin with.

Note that the documentation clearly states that the amount is merely the maximum number of characters (for CLOBs and bytes for BLOBs) that will be read from the LOB. There are two possible reasons for not being able to read the number of characters (or bytes) you requested: the buffer is too small or there aren't that many characters (or bytes) available. The only way I can differentiate is by finding the length myself -- which implies additional overhead in doing so -- and that doesn't seem a good solution.

So finding out if the buffer is too short is not worth the effort, I believe. And the only time you will get OCI_NEED_DATA is if a single character cannot be stored in the supplied buffer. It is highly unlikely that you would be supplying such a short buffer to read a LOB! The only time you should really be using LOBs is if the size exceeds several megabytes at least -- the performance difference between using dpiLob_readBytes() for a short LOB and simply using a string buffer is significant!

Hopefully this explanation is sufficient!

laurenz commented 3 years ago

Yes, your explanation is clear. This (to say it friendly) limitation of the Oracle API is difficult to work around.

But I'd argue that not throwing an error is the wrong thing to do: it lets the user believe that a zero-length CLOB was correctly retrieved, when in reality a longer value was truncated. I think that it is better to throw an error as in the original behaviour, but a meaningful error message would be nice.

anthony-tuininga commented 3 years ago

I'm not sure why the zero length case matters so much to you? If the CLOB is actually of length 2 and you pass a buffer of size 5 you will get 1 character -- which lets the user believe that a 1-length CLOB was correctly retrieved when in reality a longer value was truncated. You cannot know it was truncated unless you call dpiLob_getSize() to determine the actual length!

If you want to ensure that you always have the right buffer size you can use the length returned by dpiLob_getSize() and multiply by the max bytes per character found in dpiEncodingInfo.maxBytesPerCharacter. If you are always using UTF-8 then you can simply multiply by 4. Similarly, if you want to simply retrieve 5000 characters you can take that amount and multiply the max bytes per character to get the right sized buffer. If you do that consistently, the only reason you'll get less characters is because there aren't that many to read. That's the only way to be certain.

laurenz commented 3 years ago

Ok, thanks for the explanation, I'll let it go at that.

Thanks too for that library in general – it makes Oracle's terrible C API much more comfortable to use.