googleapis / nodejs-bigquery-storage

BigQuery Storage Node.js client
Apache License 2.0
31 stars 16 forks source link

How to handle and safely ignore errors when using committed type stream #467

Open convers39 opened 1 week ago

convers39 commented 1 week ago

Thanks for stopping by to ask us a question! Please make sure to include:

PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.

What you're trying to do

I migrate my code from tabledata.insertAll API to storage write API, as I found that tabledata.insertAll will occasionally insert the same data twice.

I implemented it as the document and code example shows with CommittedStream, however I have some trouble handling errors.

What I am trying to do is to ignore errors as the doc suggested but meanwhile catch errors that should not be ignored.

When you specify an offset, the write operation is idempotent, which makes it safe to retry due to network errors or unresponsiveness from the server. Handle the following errors related to offsets:

  • ALREADY_EXISTS (StorageErrorCode.OFFSET_ALREADY_EXISTS): The row was already written. You can safely ignore this error.
  • OUT_OF_RANGE (StorageErrorCode.OFFSET_OUT_OF_RANGE): A previous write operation failed. Retry from the last successful write.

What code you've already tried

Here is the code of my implementation.

    const writeStream = await writeClient.createWriteStreamFullResponse({
      streamType,
      destinationTable,
    });
    if (writeStream.name == null) {
      throw new Error('writeStream undefined');
    }
    const streamId = writeStream.name;
    logger.info(`Stream created: ${streamId}`);
    if (writeStream.tableSchema == null) {
      throw new Error('table schema undefined');
    }

    const protoDescriptor = adapt.convertStorageSchemaToProto2Descriptor(
      writeStream.tableSchema,
      'root',
    );

    const connection = await writeClient.createStreamConnection({
      streamId,
    });
    const writer = new managedwriter.JSONWriter({
      connection,
      protoDescriptor,
    });

    let currentOffset = 0;
    while (currentOffset < data.length) {
      const dataChunk = data.slice(
        currentOffset,
        currentOffset + BQ_INSERT_DATA_CHUNCK_COUNT,
      );
      const pw = writer.appendRows(dataChunk as JSONList, currentOffset);
      const result = await pw.getResult();
      currentOffset = Number.parseInt(
        result.appendResult?.offset?.value?.toString() ?? '0',
        10,
      );
      currentOffset += BQ_INSERT_DATA_CHUNCK_COUNT;
      // TODO: error handling
      logger.info('pending write pushed', {
        result,
        currentOffset,
      });
    }

    logger.info('data inserted');
    await connection.finalize();

Any error messages you're getting

PendingWrite.getResult will contains rowErrors and error properties, the documented 2 errors will come in the error prop. Here is the screenshot when I produce the error intentionally with the same offset value.

image

I tried to decode the Buffer in error.details with toString, and indeed I found the ALREADY_EXISTS keyword.

image

However, the error.code is 6, which is different from what I found in the source code

    // Offset already exists.
    OFFSET_ALREADY_EXISTS = 8;

    // Offset out of range.
    OFFSET_OUT_OF_RANGE = 9;

Now I don't have any idea what will the error code be for OUT_OF_RANGE error, or where to find the correct error code list.

Additional questions

Apart from the error code mismatch issue above, I am also not sure about the error handling implementation, and the offset manipulation due to the lack of sample code or documentation.

convers39 commented 1 week ago

BTW google support responded with 'ask our sales or account team', now I can only rely on github :(

Hopefully someone can follow on this question 🙏

image