vfedotovs / sslv_web_scraper

ss.lv web scraping app helps automate information scraping and filtering from classifieds and emails results and stores scraped data in database
GNU General Public License v3.0
5 stars 3 forks source link

BUG(ts): Task keeps timung out with: Request timed out after 30 seconds #288

Closed vfedotovs closed 3 months ago

vfedotovs commented 3 months ago

Affected versions: all that are utilising lambda scraped job raw data file as source

❯ grep  ERRO  task_scheduler.log
2024-07-21 00:37:13,338 [MainThread  ] [ERROR] : execute_ogre_task: 79: Request timed out after 30 seconds. -expected bahavior because scrape job takes more than 30 sec
2024-07-22 00:37:45,342 [MainThread  ] [ERROR] : execute_ogre_task: 79: Request timed out after 30 seconds.
----
2024-08-07 00:46:18,090 [MainThread  ] [ERROR] : execute_ogre_task: 79: Request timed out after 30 seconds.
2024-08-08 00:46:50,086 [MainThread  ] [ERROR] : execute_ogre_task: 79: Request timed out after 30 seconds.
2024-08-09 00:47:22,126 [MainThread  ] [ERROR] : execute_ogre_task: 79: Request timed out after 30 seconds.
2024-08-10 00:47:53,976 [MainThread  ] [ERROR] : execute_ogre_task: 79: Request timed out after 30 seconds.
2024-08-11 00:48:25,926 [MainThread  ] [ERROR] : execute_ogre_task: 79: Request timed out after 30 seconds.

Lambda scrape job was not working - evidence in S3 bucket:

2024-07-16 01:19:01       8502 Ogre-raw-data-report-2024-07-16T00-18-58.txt
2024-07-17 01:18:59       8492 Ogre-raw-data-report-2024-07-17T00-18-56.txt
2024-07-18 01:19:01       8470 Ogre-raw-data-report-2024-07-18T00-18-58.txt
2024-07-19 01:19:02       8473 Ogre-raw-data-report-2024-07-19T00-18-58.txt. <<< Last completed job

Resolution add cronjob to trigger lambda execution and add S3 access permisisons

Extracting data from message URL  60
Extracting data from message URL  61
Extracting data from message URL  62
--- cut --
[ERROR] ClientError: An error occurred (AccessDenied) when calling the PutObject operation: Access Denied
Traceback (most recent call last):
  File "/var/task/app.py", line 172, in handler
    upload_text_file_to_s3(original_filename, S3_bucket, new_filename )
  File "/var/task/app.py", line 114, in upload_text_file_to_s3
    s3_client.upload_fileobj(file, bucket_name, s3_key)
  File "/var/runtime/boto3/s3/inject.py", line 642, in upload_fileobj
    return future.result()
  File "/var/runtime/s3transfer/futures.py", line 103, in result
    return self._coordinator.result()
  File "/var/runtime/s3transfer/futures.py", line 266, in result
    raise self._exception
  File "/var/runtime/s3transfer/tasks.py", line 139, in __call__
    return self._execute_main(kwargs)
  File "/var/runtime/s3transfer/tasks.py", line 162, in _execute_main
    return_value = self._main(**kwargs)
  File "/var/runtime/s3transfer/upload.py", line 764, in _main
    client.put_object(Bucket=bucket, Key=key, Body=body, **extra_args)
  File "/var/runtime/botocore/client.py", line 553, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/var/runtime/botocore/client.py", line 1009, in _make_api_call
    raise error_class(parsed_response, operation_name)
END RequestId: 3c6e748e-4c44-45c3-b511-cdfc7a35b6bd
REPORT RequestId: 3c6e748e-4c44-45c3-b511-cdfc7a35b6bd  Duration: 275070.77 ms  Billed Duration: 276427 ms  Memory Size: 256 MB Max Memory Used: 91 MB  Init Duration: 1355.42 ms

Resolution: AWS console > go Europe Ireland Region > lambdas select legacy lambda > configuration > permissions > select role xxxx > add AmazonS3FullAccess permission

Lambda triggered every day at UTC Wed, 21 Aug 2024 00:25:00 UTC Thu, 22 Aug 2024 00:25:00 UTC Fri, 23 Aug 2024 00:25:00 UTC

vfedotovs commented 3 months ago

Resolved with changing config in AWS console