y-scope / clp

Compressed Log Processor (CLP) is a free log management tool capable of compressing logs and searching the compressed logs without decompression.
https://yscope.com
Apache License 2.0
871 stars 70 forks source link

clp-s is truncating the json bytes while compression #289

Open satya256 opened 9 months ago

satya256 commented 9 months ago

Bug

I am trying to compress the json input file which contains the larger json (may be larger than 1MB )per line and the total file size is around 100 MB and outputting the following error lines

[error] Truncated JSON (323 bytes) at end of file

CLP version

a7368cfb0828b4eee0482cbb9569bd14557b9fed

Environment

Ubuntu 22.04

Reproduction steps

Input the json file with more than 30MB in size and also keep larger json lines which are more than 1MB

satya256 commented 9 months ago

If I change the buff size to say 300 MB from 1 MB the above issue is not observed

https://github.com/y-scope/clp/blob/main/components/core/src/clp_s/JsonFileIterator.hpp#L25

gibber9809 commented 9 months ago

Hi @satya256 thanks for the report. We're looking into this and putting together some changes to make the JSON parser a bit more robust.

I just have some clarifying questions that should help us narrow down the specific issue you've run into. 1) Does your JSON log data contain UTF-8 characters? 2) Is your JSON log data new-line delimited or delimited another way, and do JSON records ever contain a newline in the middle?

bb-rajakarthik commented 9 months ago

hi @gibber9809. To answer your questions on behalf of @satya256,

  1. Yes all characters in the JSONs are UTF-8
  2. Yes our JSON data is new line delimited and we do have newlines in the middle which are escaped
gibber9809 commented 8 months ago

Hey @bb-rajakarthik and @satya256, we merged #310 which significantly improves error handling and error reporting during compression. The issue you ran into should be fixed, but please let us know if you're still encountering any issues with compression.