terascope / kafka-assets

teraslice asset for kafka operations
MIT License
1 stars 1 forks source link

JSON parsing error only shows up with a particular slice size #290

Open ciorg opened 4 years ago

ciorg commented 4 years ago

I'm getting the following error, but only when I change the slices to 25,000: TSError: Failure to parse buffer, SyntaxError: Unexpected token b in JSON at position 3 at pRetry

When I run the same job with size 100,000 I don't get the error.

I ran both test jobs 2x and saw the same thing both times.

Job settings for job that gets the slice error:

{
            "_op": "kafka_reader",
            "connection": "CONNECTION,
            "topic": "TOPIC",
            "group": "GROUP",
            "size": 25000,
            "wait": 30000
        },
        {
            "_op": "noop"
        }

graphana:

Screen Shot 2020-06-12 at 2 33 56 PM

Job settings for job that gets no errors:

{
            "_op": "kafka_reader",
            "connection": "CONNECTION",
            "topic": "TOPIC",
            "group": "GROUP",
            "size": 100000,
            "wait": 30000
        },
        {
            "_op": "noop"
        }

graphana:

Screen Shot 2020-06-12 at 2 33 14 PM

If I run it with a size of 1000 I don't get any slice errors either. Not sure if the error is being swallowed or if something is breaking the json on certain slice sizes.

Just random luck that I found this, but thought I should document it.

peterdemartini commented 4 years ago

It would be useful to see the record that is failing to parse, (it might help us figure out why it is happening), maybe if _dead_letter_action to log or use the kafka dead letter queue

ciorg commented 4 years ago

I added the dead_letter_queue to the job and ran it again and was able to send the bad records to another topic.

The buffer in the records in the dead_letter queue converts to hex, and the hex converts to a parsable json doc.

The dead_letter_queue also adds the partition and offset - so I looked a record up in the original topic with kafkacat and see the same thing, a long hex string.

Almost all the records show up as json with kafkacat - so it makes me think they are coming in as hex strings.

But that doesn't explain why I only see these errors with certain slice size. Seems like they would throw errors every time?

peterdemartini commented 4 years ago

I can't imagine why the slice size would affect this. This should like just a data problem and maybe it is just chance that it happens when you increase the slice size?