sophos / Sophos-Central-SIEM-Integration

Simple integration script for 3rd party systems such as SIEMs. Offers command line, file or syslog output in CEF, JSON or key-value pair formats.
121 stars 70 forks source link

Latest version: results not in JSON format when using API token method #52

Closed obpedro closed 3 years ago

obpedro commented 3 years ago

Hi team,

I went to install the latest version on a machine today, and realized that when I configured the data input in Splunk, the events were not coming in proper JSON format - they were broken up into lines, so instead of a single event with the following: { customer_id: xxx, datastream: xxx }

I was actually getting multiple events in the logs: { customer_id: xxx

you get the idea :)

Once I reverted to the latest 1.x version things worked just fine and the events were generated and indexed properly. Guessing this is a bug but thought I'd raise the issue so you could take a look. Hoping it's not a feature anyway :)

Thanks in advance!

ramksophos commented 3 years ago

Hi @obpedro, we are treating this as bug that inadvertently added indentation when printing JSON, and we're working on a fix.

ramksophos commented 3 years ago

@obpedro, are you in a position to validate that #56 fixed your issue?

ramksophos commented 3 years ago

Now fixed in master.

obpedro commented 2 years ago

@rkamat my apologies for the super late reply, I was finally able to get around to this. Running the latest version I'm still seeing the issue - json messages are in multiple lines instead of single lines

ramksophos commented 2 years ago

@obpedro is it possible you tested with v2.0.1 from July last year? We have just released v2.1.0 which is the same as the tip in master. Can you please try again? Thanks

obpedro commented 2 years ago

Ahh.... yes that is 100% possible. I actually downloaded 2.1.0 but as I had a copy of 2.0.1 in the same folder, I mistakenly re-tried with 2.0.1. Doh! I will test ASAP and report back. Thank you!

gregsmith-movista commented 2 years ago

So, this looks like it might be "partially fixed". The JSON is not kicking out as individual lines anymore, but its not breaking up multiple events correctly either. The data looks fine when sent to STDOUT, but when it goes to syslog, it is mangled...

The <30> character appears to be incorrectly used to break up lines? Here is sample output with some data redacted...

<30>{"endpoint_type": "computer", "endpoint_id": "redacted", "source_info": {"ip": "redactedIP"}, "customer_id": "redacted", "severity": "medium", "threat": "EICAR-AV-Test", "type": "Event::Endpoint::Threat::Detected", "id": "redacted", "group": "MALWARE", "name": "EICAR-AV-Test", "datastream": "event", "end": "2022-01-10T22:14:58.000Z", "rt": "2022-01-10T22:15:01.959Z", "duid": "redacted", "suser": "redacted\\gsmith", "dhost": "redacted", "detection_identity_name": "EICAR-AV-Test", "filePath": "h___s://secure.eicar.org/eicar.com"}<30>{"endpoint_type": "computer", "endpoint_id": "redacted", "source_info": {"ip": "redacted"}, "customer_id": "redacted", "severity": "low", "threat": "EICAR-AV-Test", "type": "Event::Endpoint::Threat::CleanedUp", "id": "redacted", "group": "MALWARE", "name": "EICAR-AV-Test", "datastream": "event", "end": "2022-01-10T22:14:58.000Z", "rt": "2022-01-10T22:15:02.009Z", "duid": "redacted", "suser": "redacted\\gsmith", "dhost": "redacted", "detection_identity_name": "EICAR-AV-Test", "filePath": "h___s://secure.eicar.org/eicar.com"}<30>{"endpoint_type": "computer", "endpoint_id": "redacted", "source_info": {"ip": "redacted"}, "customer_id": "redacted", "severity": "medium", "threat": "EICAR-AV-Test", "type": "Event::Endpoint::Threat::Detected", "id": "redacted", "group": "MALWARE", "name": "EICAR-AV-Test", "datastream": "event", "end": "2022-01-10T22:17:39.000Z", "rt": "2022-01-10T22:17:41.043Z", "duid": "redacted", "suser": "redacted\\gsmith", "dhost": "redacted", "detection_identity_name": "EICAR-AV-Test", "filePath": "C:\\Users\\gsmith\\Documents\\eicar.com"}<30>{"endpoint_type": "computer", "endpoint_id": "redacted", "source_info": {"ip": "redacted"}, "customer_id": "redacted", "severity": "low", "threat": "EICAR-AV-Test", "type": "Event::Endpoint::Threat::CleanedUp", "id": "redacted", "group": "MALWARE", "name": "EICAR-AV-Test", "datastream": "event", "end": "2022-01-10T22:18:09.000Z", "rt": "2022-01-10T22:18:10.642Z", "duid": "redacted", "suser": "redacted\\gsmith", "dhost": "redacted", "detection_identity_name": "EICAR-AV-Test", "filePath": "C:\\Users\\gsmith\\Documents\\eicar.com"}

Thats all still one line out to the SIEM via SYSLOG and in our case Rapid7 is not ingesting the line correctly because of the strange formatting.

ecollins-sophos commented 2 years ago

Hi @gregsmith-movista could you please clarify this process some more for us. If you have the output in JSON format for ingestion into SIEM applications, how does syslog come into play here?

Also, do you see the line erroneous line break character in the JSON or only when you attempt to convert it to syslog?

gregsmith-movista commented 2 years ago

Sophos provides a SIEM integration script (this script) to connect to their secure API for event and alert data. The integration script must be run on a scheduled basis using a Cronjob. The script pulls down log data from the Sophos Central API and forwards them to our Rapid7 InsightIDR Collector, which listens on a port for syslog information.

We attempted to run this a couple weeks back when it was the 2.0.1 version, and the ingest tool was seeing the same issues with JSON where each line was a separate entry in the SysLog data, so not something it could parse. We updated to the 2.1.0 script here to see if it fixed the parsing (since it was in the notes for update) and it helped, but now when a group of events comes in from one system, InsightIDR is only parsing the first entry, as all the rest of the data past <30> is being ignored as the JSON formatting is broken. Truthfully, each and every event from Sophos should rightfully be its own line of JSON.

When we run this to output the data to STDOUT or to a file, the formatting looks correct. When we set the data to be SYSLOG, then the formatting for SYSLOG ingestion is broken, with each event coming in as JSON, but again, only the first event is read in, as the JSON formatting should be formatted for multiple entries, or each entry should be its own line of JSON data, not grouped like a single event.