Open ranceforhiwd opened 6 months ago
Created function and researching for python code:
https://medium.com/pythons-gurus/read-file-data-from-s3-using-python-aws-lambda-4b3eb515285c
Researching boto3 documentation to implement the read routine from s3.
https://boto3.amazonaws.com/v1/documentation/api/latest/index.html
Created code from documentation but getting error message:
"errorMessage": "An error occurred (AccessDenied) when calling the ListBuckets operation: Access Denied",
checking roles and policies next
updated lambda function with new policy
My results of test:
START RequestId: 7813bafa-225c-49cb-b81a-569879568673 Version: $LATEST backlog-source bgbucket89 code-versions markup-images statefunct-dev-serverlessdeploymentbucket-bz9vl0qpq3cv support-doc-pdfs support-docs-pngs testerwebsite textractspdf END RequestId: 7813bafa-225c-49cb-b81a-569879568673 REPORT RequestId: 7813bafa-225c-49cb-b81a-569879568673 Duration: 447.74 ms Billed Duration: 448 ms Memory Size: 128 MB Max Memory Used: 83 MB Init Duration: 479.29 ms
Print a list of all the buckets in s3
Completed:
Print a list of all the buckets in s3
Taking break to will resume at 1 pm.
Resume working, now looking for routines to read list of files in s3 bucket of choice as per requirements: Found link in Boto3 with example for s3 and Python:
I'm using this as a reference to list the filenames inside a bucket
Now have code that will list filenames and metadata
{'ResponseMetadata': {'RequestId': 'V0MG40BY39K0WENW', 'HostId': '7bQUIm2lKEVx0kx07Bt/B73ZizeCN84IqBEesMP8S8u8Hzsy5dzF2Wj9IJ0x0yY4hhTqndNmemKfghjDZ/xLErkb17FZWPBT', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amz-id-2': '7bQUIm2lKEVx0kx07Bt/B73ZizeCN84IqBEesMP8S8u8Hzsy5dzF2Wj9IJ0x0yY4hhTqndNmemKfghjDZ/xLErkb17FZWPBT', 'x-amz-request-id': 'V0MG40BY39K0WENW', 'date': 'Thu, 30 May 2024 17:54:49 GMT', 'x-amz-bucket-region': 'us-east-1', 'content-type': 'application/xml', 'transfer-encoding': 'chunked', 'server': 'AmazonS3'}, 'RetryAttempts': 0}, 'IsTruncated': True, 'Contents': [{'Key': 'Tasks_-_PJ2405-0001.pdf', 'LastModified': datetime.datetime(2024, 5, 28, 17, 34, 52, tzinfo=tzlocal()), 'ETag': '"70df1832fddb5944083e0a5fc83bb254"', 'Size': 705825, 'StorageClass': 'STANDARD'}, {'Key': 'Third-party-vendors.pdf', 'LastModified': datetime.datetime(2024, 5, 26, 19, 24, 50, tzinfo=tzlocal()), 'ETag': '"80c6858aeaa26f795b8596fdeeba9d60"', 'Size': 523879, 'StorageClass': 'STANDARD'}], 'Name': 'textractspdf', 'Prefix': '', 'MaxKeys': 2, 'EncodingType': 'url', 'KeyCount': 2, 'NextContinuationToken': '1DYEcq+N/ukSRDxw0lAIorGBXChQyCOfmRkwUdqs/+CE+e2XBB8OpDEkMevU4juPY'}
Status
completed: Read contents of the s3 bucket and output the bucket and filenames Print a list of all the buckets in s3
todo: Print the times the files were uploaded Print the event metadata included when a file is uploaded to s3 After you get your lambda function working, commit your file to a repo attached to this project issue.
I'm looking through the docs for a way to get the meta data for this task item:
Print the times the files were uploaded
Reading this page to learn how to use GetObject and it's response data format. this is the general doc
https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObject.html
This is python specific
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/get_object.html#
I found a good example routine to GetObject by providing a key. The response appears to give me the other information I need to complete this task. I now need to parse the json response for the items I need to collect for my solutions.
{ "ResponseMetadata": { "RequestId": "EPWXB918K81NH4YE", "HostId": "v0Vb4fRb6wST2WartwwpOpLAO9OOd+XeLDLDGJG+RLOBBk4MwIo7IjRr9wCJjy+z2VjDpSsm078=", "HTTPStatusCode": 200, "HTTPHeaders": { "x-amz-id-2": "v0Vb4fRb6wST2WartwwpOpLAO9OOd+XeLDLDGJG+RLOBBk4MwIo7IjRr9wCJjy+z2VjDpSsm078=", "x-amz-request-id": "EPWXB918K81NH4YE", "date": "Fri, 31 May 2024 20:26:56 GMT", "last-modified": "Sun, 26 May 2024 19:24:50 GMT", "etag": "\"80c6858aeaa26f795b8596fdeeba9d60\"", "x-amz-server-side-encryption": "AES256", "accept-ranges": "bytes", "content-type": "application/pdf", "server": "AmazonS3", "content-length": "523879" }, "RetryAttempts": 0 }, "AcceptRanges": "bytes", "LastModified":datetime.datetime(2024, 5, 26, 19, 24, 50, "tzinfo=tzutc())", "ContentLength": 523879, "ETag": "\"80c6858aeaa26f795b8596fdeeba9d60\"", "ContentType": "application/pdf", "ServerSideEncryption": "AES256", "Metadata": {}, "Body":<botocore.response.StreamingBody object at 0x7fa9f3473370>
Working with json data using Python
https://www.codementor.io/@simransinghal/working-with-json-data-in-python-165crbkiyk
Now getting this error after adding code to convert the json to a string, then to a dictionary where we can get the individual parts using Python
Object of type datetime is not JSON serializable
Reading:
https://www.geeksforgeeks.org/how-to-fix-datetime-datetime-not-json-serializable-in-python/
Now getting this error after adding code to convert the json to a string, then to a dictionary where we can get the individual parts using Python
Object of type datetime is not JSON serializable
Reading:
https://www.geeksforgeeks.org/how-to-fix-datetime-datetime-not-json-serializable-in-python/
I found the answer to this issue here: https://stackoverflow.com/questions/11875770/how-can-i-overcome-datetime-datetime-not-json-serializable/36142844#36142844
I finally got these results after calling several s3 sdk method for Python. I will be able to use these results to perform the tasks in the project requirements. Below is my output from the returned json:
START RequestId: e8980756-78b2-4903-89cb-c864d8dc0a0a Version: $LATEST
list the buckets
backlog-source bgbucket89 code-versions markup-images statefunct-dev-serverlessdeploymentbucket-bz9vl0qpq3cv support-doc-pdfs support-docs-pngs testerwebsite textractspdf
These are the keys:
AcceptRanges Body ContentLength ContentType ETag LastModified Metadata ResponseMetadata ServerSideEncryption
Last Modified: 2024-05-26 19:24:50+00:00
END RequestId: e8980756-78b2-4903-89cb-c864d8dc0a0a REPORT RequestId: e8980756-78b2-4903-89cb-c864d8dc0a0a Duration: 707.92 ms Billed Duration: 708 ms Memory Size: 128 MB Max Memory Used: 86 MB Init Duration: 460.57 ms
Integrate a lambda (Use your test lambda function for this) function into the test system so that we can read the events and contents of the s3 bucket using Python routines. The python code will be required to perform the following processing:
Read contents of the s3 bucket and output the bucket and filenames Print the times the files were uploaded Print the event metadata included when a file is uploaded to s3 Print a list of all the buckets in s3 After you get your lambda function working, commit your file to a repo attached to this project issue.