Open kclinden opened 3 months ago
Tried updating json.dump as below and the error went away. json.dumps(j, indent=4, sort_keys=True, default=str)
def obtain_logs(self, bucket: str, log_path: str) -> List[str]:
"""Fetch a parquet file from a bucket and obtain a list of the events it contains.
Parameters
----------
bucket : str
Bucket to get the file from.
log_path : str
Relative path of the file inside the bucket.
Returns
-------
events : List[str]
Events contained inside the parquet file.
"""
debug(f'Processing file {log_path} in {bucket}', 2)
events = []
try:
raw_parquet = io.BytesIO(self.client.get_object(Bucket=bucket, Key=log_path)['Body'].read())
except Exception as e:
debug(f'Could not get the parquet file {log_path} in {bucket}: {e}', 1)
sys.exit(21)
pfile = pq.ParquetFile(raw_parquet)
for i in pfile.iter_batches():
for j in i.to_pylist():
events.append(json.dumps(j, indent=4, sort_keys=True, default=str))
debug(f'Found {len(events)} events in file {log_path}', 2)
Thank you for these reports @kclinden, we will review this.
datetime
object that the json.dumps
could not serialize. The fix includes str
as the default applied function to use for those objects not present in the conversion table.4.8.0
due to https://github.com/wazuh/wazuh/issues/23672 required to use the module in the master
branch), we should review if the Security Lake rules should be modified or extended taking into account the new version.Although the fix works (tested by modifying the module in the 4.8.0 due to https://github.com/wazuh/wazuh/issues/23672 required to use the module in the master branch), we should review if the Security Lake rules should be modified or extended taking into account the new version.
Go ahead with the fix to ensure we are compatible with Source version 2 as well as maintain compatibility with Source version 1, ignoring ruleset for now.
Description
When integrating with AWS Security Lake with Source Version 2 I am getting the following error:
Tasks