philhagen / sof-elk

Configuration files for the SOF-ELK VM
GNU General Public License v3.0
1.46k stars 272 forks source link

Fixing Microsoft 365 multiline JSON logs parsing #316

Open BrianMer opened 6 months ago

BrianMer commented 6 months ago

Hi,

Idk if this pull request can be related to issue #285, but I think I fixed multiline JSON array logs for M365 UAL parsing problem.

Here is a sample (anonymised), if you want to test it by yourself:

[
{
"CreationTime": "2023-05-31T22:52:10",
"Operation": "FileAccessed",
"UserId": "app@sharepoint",
"ClientIP": "1.2.3.4",
"ObjectId": "https://anon.sharepoint.com/sites/folder/anon.aspx",
"EventSource": "SharePoint",
"UserAgent": "ISV|Veeam Software|Veeam Backup for Office 365/5.0",
"SourceFileExtension": "aspx"
},
{
"CreationTime": "2023-05-31T22:52:10",
"Operation": "FileAccessed",
"UserId": "app@sharepoint",
"ClientIP": "1.2.3.4",
"ObjectId": "https://anon.sharepoint.com/sites/folder/anonymous.aspx",
"EventSource": "SharePoint",
"UserAgent": "ISV|Veeam Software|Veeam Backup for Office 365/5.0",
"SourceFileExtension": "aspx"
},
{
"CreationTime": "2023-05-31T21:52:09",
"Operation": "FileAccessed",
"UserId": "app@sharepoint",
"ClientIP": "1.2.3.4",
"ObjectId": "https://anon.sharepoint.com/sites/folder/anony.aspx",
"EventSource": "SharePoint",
"UserAgent": "ISV|Veeam Software|Veeam Backup for Office 365/5.0",
"SourceFileExtension": "aspx"
}
]

Thanks!

philhagen commented 6 months ago

thank you! reviewing this in conjunction with some other sample data and the Invictus folks WRT their tool output.

philhagen commented 2 months ago

I'm about to send a testing VM to a small group of people for validation. if you are interested, @BrianMer, please send me an email: Phil at lewestech dot com. while I have not yet integrated this PR, it would be done on that branch.

can you share which tool/workflow are you using that generated this JSON in multiline format? so far, I haven't been able to identify a source that does and all UAL test data I've used so far is in "ndjson" format (albeit with a .json extension).

BrianMer commented 2 months ago

Hi @philhagen, I'm going to send you a mail for testing the VM, thank you for the proposition!

In fact, the sample that I shared with you is a CSV converted into JSON by a homemade Python script, and I didn't know that it was ndjson format ; I didn't have any other sample in the correct "ndjson" format to test with.

But I tested again by converting the CSV into "ndjson" format, and it is still working.

I hope it is still working as well on your side.

philhagen commented 2 months ago

I suspect this may be an artifact of how the logs are retrieved. The workflow we've built around is from FOR509, which uses the PowerShell Search-UnifiedAuditLog command. This is used behind the scenes when running the Invictus Microsoft Extractor Suite tool. I'm still testing a few things, but I suspect it's a formatting issue. while it's not parsing for some reason that I'm tracking down now, the entries above would need to be represented like the following in the input file:

{"CreationTime":"2023-05-31T22:52:10","Operation":"FileAccessed","UserId":"app@sharepoint","ClientIP":"1.2.3.4","ObjectId":"https://anon.sharepoint.com/sites/folder/anon.aspx","EventSource":"SharePoint","UserAgent":"ISV|Veeam Software|Veeam Backup for Office 365/5.0","SourceFileExtension":"aspx"}
{"CreationTime":"2023-05-31T22:52:10","Operation":"FileAccessed","UserId":"app@sharepoint","ClientIP":"1.2.3.4","ObjectId":"https://anon.sharepoint.com/sites/folder/anonymous.aspx","EventSource":"SharePoint","UserAgent":"ISV|Veeam Software|Veeam Backup for Office 365/5.0","SourceFileExtension":"aspx"}
{"CreationTime":"2023-05-31T21:52:09","Operation":"FileAccessed","UserId":"app@sharepoint","ClientIP":"1.2.3.4","ObjectId":"https://anon.sharepoint.com/sites/folder/anony.aspx","EventSource":"SharePoint","UserAgent":"ISV|Veeam Software|Veeam Backup for Office 365/5.0","SourceFileExtension":"aspx"}
BrianMer commented 2 months ago

I retested by copying/pasting exactly what you gave me, and it is working on my side, have a look : image image

philhagen commented 2 months ago

FWIW, I was able to parse these records when in ndjson form using the pending release of the VM mentioned above. there is currently a blocker (upstream from filebeat) that I have to clear before the testing VM can be released but the parser does generally handle these log entries. We may want to customize those somewhat but that is pretty straightforward and will be part of the testing process that I'll send out with the VM when it's ready.