masaccio / numbers-parser

Python module for parsing Apple Numbers .numbers files
MIT License
201 stars 14 forks source link

Unable to parse file after downloading from s3 #16

Closed antarr closed 2 years ago

antarr commented 2 years ago

I'm trying to create a lambda functions that reads a numbers when it is uploaded to an s3 bucket. I'm getting an error when attempting to parse the file.

error

[ERROR] UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8e in position 11: invalid start byte
Traceback (most recent call last):
  File "/var/task/lambda_handler.py", line 18, in lambda_handler
    document = Document(f.read())
  File "/var/lang/lib/python3.9/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)

lambda_handler.py

import boto3
import json
import structlog

from numbers_parser import Document

log = structlog.get_logger()

def lambda_handler(event, context):
    log.info("Starting lambda handler")
    log.info("Event: {}".format(event))
    s3 = boto3.client("s3")
    bucket = event["Records"][0]["s3"]["bucket"]["name"]
    key = event["Records"][0]["s3"]["object"]["key"]
    s3.download_file(bucket, key, "/tmp/{}".format(key))
    with open("/tmp/{}".format(key)) as f:
        document = Document(f.read())
        data = parse_sheets(document)
        log.info("Parsed sheets", data=data)
        return json.dumps(data)

def parse_sheets(document):
    """
    Parses the document and returns a list of dicts with the data
    """
    sheets = document.sheets
    tables = sheets[0].tables()
    rows = tables[0].rows
    return [row.cells for row in rows]