xarf / python-xarf

[DEPRECATED] --> go to http://xarf.org
Other
10 stars 1 forks source link

RFC 3339 formatted dates seem to be created invalid when using PyXARF #1

Closed dmth closed 7 years ago

dmth commented 7 years ago

When using pyxarf's to_yaml()function RFC3339 dates seem to be invalid. I suspect the yaml library or blocklist's validator to be the source of error here, but I'm not yet sure.

For this issue I assume that the validator https://www.blocklist.de/en/xarfvalidatorreport.html is always correct when validating things.

Edit on 2017-03-13 Seems like the validator was not correct. / EDIT

I'm just documenting my tests here. Feel free to react on this issue as you like.

Output Is:

Date: '2017-03-09T09:41:37+01:00'

I'd expect:

Date:  2017-03-09T09:41:37+01:00

We use our abuse_bot-infection_0.2.0_unstable schema published in [1], it is similar to abuse_bot-infection_0.1.0

Examples to reproduce:

import yaml
print(yaml.__version__)
# Output:
# 3.10
import pyxarf
import datetime
from email.utils import formatdate ## Required for RFC2822 Tests later
SCHEMAURL = "https://raw.githubusercontent.com/Intevation/xarf-schemata/master/abuse_bot-infection_0.2.0_unstable.json"
DATE = datetime.datetime.now(tz=datetime.timezone.utc).replace(microsecond=0)
REPORTED_FROM = "reporter@example.com"
REPORT_ID = "4711@example.com"

# Now test their representation:
print(DATE.astimezone().isoformat())   # Seems to be RFC3339
# Output:
# 2017-03-09T09:41:37+01:00
print(REPORTED_FROM)
# Output:
# reporter@example.com
print(REPORT_ID)
# Output:
# 4711@example.com

params = {'schema_url': SCHEMAURL, 'schema_cache': '/tmp/', 'reported_from': REPORTED_FROM, 'report_id': REPORT_ID,'date': DATE.astimezone().isoformat(),'category': 'abuse','report_type': 'bot-infection','source_type': 'ip-address','destination_type': 'ip-address','attachment': None,'source': '192.168.0.1'}
xarf_obj = pyxarf.Xarf(**params) 
print(xarf_obj.to_yaml())
# Output:
# evidence: null
# machine_readable:
#  Attachment: none
#  Category: abuse
#  Date: '2017-03-09T09:41:37+01:00'
#  Report-ID: 4711@example.com
#  Report-Type: bot-infection
#  Reported-From: reporter@example.com
#  Schema-URL: https://raw.githubusercontent.com/Intevation/xarf-schemata/master/abuse_bot-infection_0.2.0_unstable.json
#  Source: 192.168.0.1
#  Source-Type: ip-address
#  User-Agent: pyxarf 0.0.5bereiter

## NOTE: The Strings REPORTED_FROM and REPORT_ID are now printed without quotes ', but the date is quoted... This does not seem to be correct...

Converting the Date to RFC 3339 seems to be the right way to do it, as the specification [2] says:

Date: [mandatory][only once] This field contains the date of the incident itself or date when the incident has been discovered (if not reported realtime). The date should be in RFC 3339 format - if the x-arf schema specifies the date with "format":"date-time". Due to compatibility reasons the date may be written in the RFC2822 format, no matter if "format":"date-time" is used or not in a x-arf schema description. Therefore, parser implementations should check which of the both formats is used.

I read this as should be RFC3339, may be RFC2822, thus RFC3339 is preferred.

Nevertheless, the Compatibility Path via RFC2822 works as expected:

params = {'schema_url': SCHEMAURL, 'schema_cache': '/tmp/', 'reported_from': REPORTED_FROM, 'report_id': REPORT_ID,'date': formatdate(DATE.timestamp(), localtime=True),'category': 'abuse','report_type': 'bot-infection','source_type': 'ip-address','destination_type': 'ip-address','attachment': None,'source': '192.168.0.1'}
xarf_obj = pyxarf.Xarf(**params) 
print(xarf_obj.to_yaml())
# Output:
# evidence: null
# machine_readable:
# Attachment: none
# Category: abuse
#  Date: Thu, 09 Mar 2017 09:41:37 +0100
#  Report-ID: 4711@example.com
#  Report-Type: bot-infection
#  Reported-From: reporter@example.com
#  Schema-URL: https://raw.githubusercontent.com/Intevation/xarf-schemata/master/abuse_bot-infection_0.2.0_unstable.json
#  Source: 192.168.0.1
#  Source-Type: ip-address
#  User-Agent: pyxarf 0.0.5bereiter

I'd expect that PyXARF is capable of converting Python Datetime Objects into the correct format. It seems like it cannot do this, or have I overlooked something?

Again an example:

params = {'schema_url': SCHEMAURL, 'schema_cache': '/tmp/', 'reported_from': REPORTED_FROM, 'report_id': REPORT_ID,'date': DATE,'category': 'abuse','report_type': 'bot-infection','source_type': 'ip-address','destination_type': 'ip-address','attachment': None,'source': '192.168.0.1'}
xarf_obj = pyxarf.Xarf(**params) 
print(xarf_obj.to_yaml())
# Output:
# Traceback (most recent call last):
#  File "<stdin>", line 1, in <module>
#  File "/usr/lib/python3/dist-packages/pyxarf/xarf.py", line 375, in to_yaml
#    self.get_report_obj(part), default_flow_style=False
#  File "/usr/lib/python3/dist-packages/pyxarf/xarf.py", line 395, in get_report_obj
#    'machine_readable': self._get_validated_machine_readable(),
#  File "/usr/lib/python3/dist-packages/pyxarf/xarf.py", line 338, in _get_validated_machine_readable
#    self.machine_readable
#  File "/usr/lib/python3/dist-packages/pyxarf/xarf.py", line 290, in _validate_schema
#    ', '.join(errors)
# pyxarf.exceptions.ValidationError: Date datetime.datetime(2017, 3, 9, 8, 41, 37, tzinfo=datetime.timezone.utc) is not of type 'string'

BR Dustin

[1] https://raw.githubusercontent.com/Intevation/xarf-schemata/master/abuse_bot-infection_0.2.0_unstable.json [2] https://github.com/xarf/xarf-specification/blob/master/xarf-specification_0.2.md#date-mandatoryonly-once

bernhardreiter commented 7 years ago

Note https://github.com/xarf/python-xarf/blob/master/pyxarf/xarf.py#L36 the parameter date is speficied to be string

    class Xarf(object):
    '''
    xarf report generation class

    ...

    :type date: string
    :param source: source of report
bernhardreiter commented 7 years ago

When entering a yaml report into https://www.blocklist.de/en/xarfvalidatorreport.html (without headers) there is a problem if some strings are quoted which is not specific to date. The following reports are indicated as problematic by the blockist.de validator.

Attachment: none
Category: abuse
Date: 2017-03-09T09:41:37+01:00
Report-ID: '4711@example.com'
Report-Type: bot-infection
Reported-From: reporter@example.com
Schema-URL: https://raw.githubusercontent.com/Intevation/xarf-schemata/master/abuse_bot-infection_0.2.0_unstable.json
Source: 192.168.0.1
Source-Type: ip-address
User-Agent: pyxarf 0.0.5bereiter
Attachment: none
Category: abuse
Date: '2017-03-09T09:41:37+01:00'
Report-ID: 4711@example.com
Report-Type: bot-infection
Reported-From: reporter@example.com
Schema-URL: https://raw.githubusercontent.com/Intevation/xarf-schemata/master/abuse_bot-infection_0.2.0_unstable.json
Source: 192.168.0.1
Source-Type: ip-address
User-Agent: pyxarf 0.0.5bereiter

Because my reading of YAML is that single quotes around scalars are fine, see http://www.yaml.org/spec/1.2/spec.html#id2788097 the blocklist.de validator seems to have a problem.

dmth commented 7 years ago

I've created this Issue based on the assumption that the validator validator https://www.blocklist.de/en/xarfvalidatorreport.html is always correct when validating things. Seems like it was not.

As the validator was fixed, I'm closing this issue.