ranceforhiwd / Python-Practice

Practice python routines
1 stars 0 forks source link

integrate Lambda Solution #18

Open ranceforhiwd opened 6 months ago

ranceforhiwd commented 6 months ago

Integrate a lambda (Use your test lambda function for this) function into the test system so that we can read the events and contents of the s3 bucket using Python routines. The python code will be required to perform the following processing:

Read contents of the s3 bucket and output the bucket and filenames Print the times the files were uploaded Print the event metadata included when a file is uploaded to s3 Print a list of all the buckets in s3 After you get your lambda function working, commit your file to a repo attached to this project issue.

ranceforhiwd commented 6 months ago

Created function and researching for python code:

https://medium.com/pythons-gurus/read-file-data-from-s3-using-python-aws-lambda-4b3eb515285c

ranceforhiwd commented 6 months ago

Researching boto3 documentation to implement the read routine from s3.

https://boto3.amazonaws.com/v1/documentation/api/latest/index.html

ranceforhiwd commented 6 months ago

Created code from documentation but getting error message:

"errorMessage": "An error occurred (AccessDenied) when calling the ListBuckets operation: Access Denied",

checking roles and policies next

ranceforhiwd commented 6 months ago

updated lambda function with new policy

My results of test:

START RequestId: 7813bafa-225c-49cb-b81a-569879568673 Version: $LATEST backlog-source bgbucket89 code-versions markup-images statefunct-dev-serverlessdeploymentbucket-bz9vl0qpq3cv support-doc-pdfs support-docs-pngs testerwebsite textractspdf END RequestId: 7813bafa-225c-49cb-b81a-569879568673 REPORT RequestId: 7813bafa-225c-49cb-b81a-569879568673 Duration: 447.74 ms Billed Duration: 448 ms Memory Size: 128 MB Max Memory Used: 83 MB Init Duration: 479.29 ms

ranceforhiwd commented 6 months ago

Print a list of all the buckets in s3

Completed:

Print a list of all the buckets in s3

ranceforhiwd commented 6 months ago

Taking break to will resume at 1 pm.

ranceforhiwd commented 6 months ago

Resume working, now looking for routines to read list of files in s3 bucket of choice as per requirements: Found link in Boto3 with example for s3 and Python:

Image

ranceforhiwd commented 6 months ago

I'm using this as a reference to list the filenames inside a bucket

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/list_objects.html

ranceforhiwd commented 6 months ago

Now have code that will list filenames and metadata

{'ResponseMetadata': {'RequestId': 'V0MG40BY39K0WENW', 'HostId': '7bQUIm2lKEVx0kx07Bt/B73ZizeCN84IqBEesMP8S8u8Hzsy5dzF2Wj9IJ0x0yY4hhTqndNmemKfghjDZ/xLErkb17FZWPBT', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amz-id-2': '7bQUIm2lKEVx0kx07Bt/B73ZizeCN84IqBEesMP8S8u8Hzsy5dzF2Wj9IJ0x0yY4hhTqndNmemKfghjDZ/xLErkb17FZWPBT', 'x-amz-request-id': 'V0MG40BY39K0WENW', 'date': 'Thu, 30 May 2024 17:54:49 GMT', 'x-amz-bucket-region': 'us-east-1', 'content-type': 'application/xml', 'transfer-encoding': 'chunked', 'server': 'AmazonS3'}, 'RetryAttempts': 0}, 'IsTruncated': True, 'Contents': [{'Key': 'Tasks_-_PJ2405-0001.pdf', 'LastModified': datetime.datetime(2024, 5, 28, 17, 34, 52, tzinfo=tzlocal()), 'ETag': '"70df1832fddb5944083e0a5fc83bb254"', 'Size': 705825, 'StorageClass': 'STANDARD'}, {'Key': 'Third-party-vendors.pdf', 'LastModified': datetime.datetime(2024, 5, 26, 19, 24, 50, tzinfo=tzlocal()), 'ETag': '"80c6858aeaa26f795b8596fdeeba9d60"', 'Size': 523879, 'StorageClass': 'STANDARD'}], 'Name': 'textractspdf', 'Prefix': '', 'MaxKeys': 2, 'EncodingType': 'url', 'KeyCount': 2, 'NextContinuationToken': '1DYEcq+N/ukSRDxw0lAIorGBXChQyCOfmRkwUdqs/+CE+e2XBB8OpDEkMevU4juPY'}

ranceforhiwd commented 6 months ago

Status

completed: Read contents of the s3 bucket and output the bucket and filenames Print a list of all the buckets in s3

todo: Print the times the files were uploaded Print the event metadata included when a file is uploaded to s3 After you get your lambda function working, commit your file to a repo attached to this project issue.

ranceforhiwd commented 6 months ago

I'm looking through the docs for a way to get the meta data for this task item:

Print the times the files were uploaded

ranceforhiwd commented 6 months ago

Reading this page to learn how to use GetObject and it's response data format. this is the general doc

https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObject.html

This is python specific

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/get_object.html#

ranceforhiwd commented 6 months ago

I found a good example routine to GetObject by providing a key. The response appears to give me the other information I need to complete this task. I now need to parse the json response for the items I need to collect for my solutions.

{ "ResponseMetadata": { "RequestId": "EPWXB918K81NH4YE", "HostId": "v0Vb4fRb6wST2WartwwpOpLAO9OOd+XeLDLDGJG+RLOBBk4MwIo7IjRr9wCJjy+z2VjDpSsm078=", "HTTPStatusCode": 200, "HTTPHeaders": { "x-amz-id-2": "v0Vb4fRb6wST2WartwwpOpLAO9OOd+XeLDLDGJG+RLOBBk4MwIo7IjRr9wCJjy+z2VjDpSsm078=", "x-amz-request-id": "EPWXB918K81NH4YE", "date": "Fri, 31 May 2024 20:26:56 GMT", "last-modified": "Sun, 26 May 2024 19:24:50 GMT", "etag": "\"80c6858aeaa26f795b8596fdeeba9d60\"", "x-amz-server-side-encryption": "AES256", "accept-ranges": "bytes", "content-type": "application/pdf", "server": "AmazonS3", "content-length": "523879" }, "RetryAttempts": 0 }, "AcceptRanges": "bytes", "LastModified":datetime.datetime(2024, 5, 26, 19, 24, 50, "tzinfo=tzutc())", "ContentLength": 523879, "ETag": "\"80c6858aeaa26f795b8596fdeeba9d60\"", "ContentType": "application/pdf", "ServerSideEncryption": "AES256", "Metadata": {}, "Body":<botocore.response.StreamingBody object at 0x7fa9f3473370>

ranceforhiwd commented 6 months ago

Working with json data using Python

https://www.codementor.io/@simransinghal/working-with-json-data-in-python-165crbkiyk

ranceforhiwd commented 6 months ago

Now getting this error after adding code to convert the json to a string, then to a dictionary where we can get the individual parts using Python

Object of type datetime is not JSON serializable

Reading:

https://www.geeksforgeeks.org/how-to-fix-datetime-datetime-not-json-serializable-in-python/

ranceforhiwd commented 5 months ago

Now getting this error after adding code to convert the json to a string, then to a dictionary where we can get the individual parts using Python

Object of type datetime is not JSON serializable

Reading:

https://www.geeksforgeeks.org/how-to-fix-datetime-datetime-not-json-serializable-in-python/

I found the answer to this issue here: https://stackoverflow.com/questions/11875770/how-can-i-overcome-datetime-datetime-not-json-serializable/36142844#36142844

ranceforhiwd commented 5 months ago

I finally got these results after calling several s3 sdk method for Python. I will be able to use these results to perform the tasks in the project requirements. Below is my output from the returned json:

START RequestId: e8980756-78b2-4903-89cb-c864d8dc0a0a Version: $LATEST

list the buckets

backlog-source bgbucket89 code-versions markup-images statefunct-dev-serverlessdeploymentbucket-bz9vl0qpq3cv support-doc-pdfs support-docs-pngs testerwebsite textractspdf

These are the keys:

AcceptRanges Body ContentLength ContentType ETag LastModified Metadata ResponseMetadata ServerSideEncryption

Last Modified: 2024-05-26 19:24:50+00:00

END RequestId: e8980756-78b2-4903-89cb-c864d8dc0a0a REPORT RequestId: e8980756-78b2-4903-89cb-c864d8dc0a0a Duration: 707.92 ms Billed Duration: 708 ms Memory Size: 128 MB Max Memory Used: 86 MB Init Duration: 460.57 ms