y-scope / clp

Compressed Log Processor (CLP) is a free log management tool capable of compressing logs and searching the compressed logs without decompression.
https://yscope.com
Apache License 2.0
871 stars 70 forks source link

can not serach on nesteld json #559

Closed mohamadassadeq closed 4 weeks ago

mohamadassadeq commented 4 weeks ago

Bug

I used clp to compress darpa log file. each line is somthing like this : {"datum":{"com.bbn.tc.schema.avro.cdm20.Event":{"uuid":"91D0EE29-A1CC-3FA9-5690-6B87FA62C4FB","sequence":{"long":206424728},"type":"EVENT_MPROTECT","threadId":{"int":14074},"subject":{"com.bbn.tc.schema.avro.cdm20.UUID":"FE1A0548-A4F7-EA2A-A897-7E3EFDD14DDE"},"predicateObject":{"com.bbn.tc.schema.avro.cdm20.UUID":"9E42D3BA-2C00-312F-8634-BF4998B8775A"},"predicateObjectPath":null,"predicateObject2":null,"predicateObject2Path":null,"timestampNanos":1557242010667000000,"names":null,"parameters":null,"location":null,"size":null,"programPoint":null,"properties":{"map":{"protection":"1"}}}},"CDMVersion":"20","type":"RECORD_EVENT","hostId":"7A665024-F3E3-3D4E-3A98-D9651E351DE4","sessionNumber":19,"source":"SOURCE_LINUX_SYSCALL_TRACE"} but when I query on data for example for "uuid":"91D0EE29-A1CC-3FA9-5690-6B87FA62C4FB" , I get No matching schemas for query . but it exists also I get this error : ./clp-s s /mnt/data/archives-trace '{datum:{com.bbn.tc.schema.avro.cdm20.Event:{uuid:91D0EE29-A1 CC-3FA9-5690-6B87FA62C4FB}}}' 2024-10-20T09:41:42.113+00:00 [error] Parser error: extraneous input '}' expecting

CLP version

last version from git

Environment

Docker version 24.0.7, build 24.0.7-0ubuntu2~20.04.1

Reproduction steps

no idea

gibber9809 commented 4 weeks ago

Hello,

There seem to be a few issues you're running into here, one of which is a bug that should get fixed by the PR I put up and linked above.

The first query, "uuid":"91D0EE29-A1CC-3FA9-5690-6B87FA62C4FB", gets interpreted as a search against the uuid key at the root level of the document. If you want to instead search against any hierarchy of keys ending with uuid you can perform the query "*.uuid":"91D0EE29-A1CC-3FA9-5690-6B87FA62C4FB".

For your second query {datum:{com.bbn.tc.schema.avro.cdm20.Event:{uuid:91D0EE29-A1CC-3FA9-5690-6B87FA62C4FB}}} the nested {} syntax is currently only supported after the first level of nesting. I.e. you should be able to rewrite your query as datum:{com\.bbn\.tc\.schema\.avro\.cdm20\.Event:{uuid:91D0EE29-A1CC-3FA9-5690-6B87FA62C4FB}}.

Unfortunately, the current version of clp-s has a bug that prevents escaping '.' characters inside of key names, so the rewritten version of the query above won't work until the linked PR gets merged.

The full search syntax for JSON logs is documented here.

mohamadassadeq commented 4 weeks ago

thanks