Closed silvhua closed 1 week ago
Done in commit e5fe04553b5c5d653d4d5d85268ea7fe5e768856
Text output from AWS Lambda console:
{
"statusCode": 200,
"body": {
"line_items": [
{
"id": "9",
"mention_text": "250.00",
"type": "line_item/amount",
"currency_code": "USD",
"units": 250,
"normalized_value": "250",
"confidence": "84%",
"pages": [
1
]
},
{
"id": "14",
"mention_text": "322.81",
"type": "line_item/amount",
"currency_code": "USD",
"units": 322,
"nanos": 810000000,
"normalized_value": "322.81"
},
{
"id": "16",
"mention_text": "322.81",
"type": "line_item/amount",
"currency_code": "USD",
"units": 322,
"nanos": 810000000,
"normalized_value": "322.81"
}
],
"total_amount": {
"id": "1",
"mention_text": "250.00",
"type": "total_amount",
"currency_code": "USD",
"units": 250,
"normalized_value": "250"
},
"receipt_date": {
"id": "5",
"mention_text": "16-Oct-2021",
"type": "receipt_date",
"year": 2021,
"month": 10,
"day": 16,
"normalized_value": "2021-10-16"
},
"supplier_address": {
"id": "6",
"mention_text": "661 University Ave., Toronto, ON M5G 1M1",
"type": "supplier_address"
},
"supplier_name": {
"id": "7",
"mention_text": "",
"type": "supplier_name",
"normalized_value": "CrossFit BC"
},
"supplier_city": {
"id": "19",
"mention_text": "",
"type": "supplier_city",
"normalized_value": "Toronto"
}
}
}
The
ParserFunction
is inapp.py
. It currently parses a receipt that was uploaded to s3 from the front end and returns a dataframe with the parsed data.The returned dataframe could be refined to only contain the essential information for the task.
To invoke the function, see back end set up task
Data processing should account for how receipts parsing likely won't be consistent due to different formatting of dates, times, totals, addresses, etc.