philhagen / sof-elk

Configuration files for the SOF-ELK VM
GNU General Public License v3.0
1.49k stars 277 forks source link

azure-vpcflow2sof-elk.py generates empty output #331

Closed Jurkiseczek closed 3 months ago

Jurkiseczek commented 3 months ago

Hey there!

I'm playing with Azure NSG flow logs and I have found out azure-vpcflow2sof-elk.py generates empty output no matter what input source I use. I tried real life, SANS FOR509 logs, ChatGPT generates logs but the output is still empty. I do not know if I'm doing something wrong but ouput is always empty. Any ideas?

philhagen commented 3 months ago

Can you provide a sample? Email is fine if it's not publicly shareable, as is a redacted subset in the issue thread here.

philhagen commented 3 months ago

Also - could you confirm the exact command line you're using? (And for the FOR509 logs, let me know which lab # and course version you're using.)

Jurkiseczek commented 3 months ago

Test inputs: ChatGPT

{
  "records": [
    {
      "time": "2024-07-29T12:00:00Z",
      "systemId": "sampleSystemId1",
      "category": "NetworkSecurityGroupFlowEvent",
      "resourceId": "/SUBSCRIPTIONS/<subscription-id>/RESOURCEGROUPS/<resource-group-name>/PROVIDERS/MICROSOFT.NETWORK/NETWORKSECURITYGROUPS/<nsg-name>",
      "operationName": "NetworkSecurityGroupFlowEvents",
      "properties": {
        "Version": 2,
        "flows": [
          {
            "rule": "AllowInternetOutBound",
            "flows": [
              {
                "mac": "00-0D-3A-B5-6F-93",
                "flowTuples": [
                  "1627564800,10.0.0.1,93.184.216.34,5060,5060,T,O,A,40,700,50",
                  "1627564801,10.0.0.2,13.107.42.14,80,443,T,O,A,60,1200,70"
                ]
              }
            ]
          },
          {
            "rule": "DenyAllInBound",
            "flows": [
              {
                "mac": "00-0D-3A-B5-6F-93",
                "flowTuples": [
                  "1627564802,13.107.42.14,10.0.0.2,443,80,T,B,D,0,0,0",
                  "1627564803,52.114.132.54,10.0.0.3,22,3389,T,B,D,0,0,0"
                ]
              }
            ]
          }
        ]
      }
    },
    {
      "time": "2024-07-29T12:05:00Z",
      "systemId": "sampleSystemId2",
      "category": "NetworkSecurityGroupFlowEvent",
      "resourceId": "/SUBSCRIPTIONS/<subscription-id>/RESOURCEGROUPS/<resource-group-name>/PROVIDERS/MICROSOFT.NETWORK/NETWORKSECURITYGROUPS/<nsg-name>",
      "operationName": "NetworkSecurityGroupFlowEvents",
      "properties": {
        "Version": 2,
        "flows": [
          {
            "rule": "AllowVNetInBound",
            "flows": [
              {
                "mac": "00-0D-3A-B5-6F-94",
                "flowTuples": [
                  "1627565100,10.0.1.1,10.0.2.1,22,22,T,O,A,30,600,40",
                  "1627565101,10.0.1.2,10.0.2.2,135,135,T,O,A,20,400,30"
                ]
              }
            ]
          },
          {
            "rule": "DenyAllOutBound",
            "flows": [
              {
                "mac": "00-0D-3A-B5-6F-94",
                "flowTuples": [
                  "1627565102,10.0.1.1,8.8.8.8,53,53,U,B,D,0,0,0",
                  "1627565103,10.0.1.2,1.1.1.1,53,53,U,B,D,0,0,0"
                ]
              }
            ]
          }
        ]
      }
    }
  ]
}

(PJH: edited above for formatting)

FOR509: lab-3.1_bonus_evidence - I was part of Beta so I assume it's from that version

command: python3 azure-vpcflow2sof-elk.py -r PT1H.json -w SANStestNSG.json -f -a

philhagen commented 3 months ago

thanks. I'm not sure that generating sample data with ChatGPT is a good approach, though. I've generally found it to be blatantly wrong in its formatting. I'll try to take a look, but if you have real samples, that would be much better.

Regarding the beta logs, the format has likely changed, so those older logs would no longer be supported in the latest public release. I'll ask the course authors to confirm, but if that is the case, we would not backport the loader script to an outdated format. The VM distributed with the beta course materials would be needed.

Jurkiseczek commented 3 months ago

I absolutely agree with you. There is no reason to work on outdated logs. I haven't found any information regarding changes in V2 NSG logs so I thought it's still the same. I tried to generate NSG flow logs by myself in Azure and somehow I got version 4 (???) instead of configured 2. I even tried to google internet but it's really impossible to found testing NSG data (that's the reason behind ChatGPT). If you know any publicly available source please let me know I can re-test on my end as well.

philhagen commented 3 months ago

At first look, the .records[].properties.flows[].flowTuples field is incomplete. There should be 13 items in that CSV, but the sample only has 11. Specifically, it looks like that sample is missing a proper flow_state field, which should be one of B, C, or E, but the sample has a 40, so the records are never tracked.

The script expects these values in that CSV field: ['timestamp', 'source_ip', 'destination_ip', 'source_port', 'destination_port', 'protocol', 'traffic_flow', 'traffic_decision', 'flow_state', 'out_packets', 'out_bytes', 'in_packets', 'in_bytes']

However, the sample does not match: 1627564800,10.0.0.1,93.184.216.34,5060,5060,T,O,A,40,700,50

That's definitely why there is no output, but I've reached out to the authors already to see if this may be a new format, something GhatGPT is inventing, or something else. Will update here when I know more.

Jurkiseczek commented 3 months ago

Thank you @philhagen, really appreciate your support.

philhagen commented 3 months ago

With many thanks to the FOR509 team, I reviewed an exported v2 record from a live Azure instance, and confirmed that the actual flowTuples strings have more data than the sample above. For example:

"1722279578,10.1.0.5,2.3.4.5,60282,12000,T,O,A,E,1,66,0,0",
"1722279584,10.1.0.5,2.3.4.5,60282,12000,T,O,A,B,,,,",
"1722279662,10.1.0.5,1.2.3.4,58738,443,T,O,A,C,72,94309,120,64560",

This is definitely the reason no data is being output - the incomplete records do not have enough fields to trigger the accounting logic for the Beginning/Continued/End state of how Azure's NSG flow logs are exported. I'm curious if there may be additional formats? However, if it's v2, there should still be all the same fields as documented in MS's own write-ups.

WRT the course beta evidence, I no longer have access to that data here, but if you can send that file to me via email, I can take a look to see what that problem may be. Please do no upload it here, as the content is protected by the CLA - my email address is in the various configuration files.

philhagen commented 3 months ago

Reviewed sample provided privately, and found it was JSON but did not contain the necessary structure for the VPC Flow Log parser. For posterity and completeness, the format SOF-ELK supports is documented here: https://learn.microsoft.com/en-us/azure/network-watcher/nsg-flow-logs-overview