vfedotovs / sslv_web_scraper

ss.lv web scraping app helps automate information scraping and filtering from classifieds and emails results and stores scraped data in database
GNU General Public License v3.0
5 stars 3 forks source link

FEAT: Implement final logic to use scraped file from S3 bucket for performance improvement #188

Closed vfedotovs closed 1 year ago

vfedotovs commented 1 year ago

Whats needed:

  1. download file form S3 - ensure that log events are created for this
  2. check if downloaded cloud file has today date - if yes in next module use CLOUD scraped FILE
  3. if cloud file does not have to days data - check if LOCAL scraped file exists do not run ??
  4. if local file with today data does not exits run local scrape job and use that output in next module

async def run_long_task(city: str, background_tasks: BackgroundTasks): """ Endpint to trigger scrape, format and insert data in DB"""

background_tasks.add_task(download_latest_lambda_file)

# todays_cloud_data_file_exist = check_today_cloud_data_file_exist()

# if todays_cloud_data_file_exist is True:
#     last_cloud_file_name = get_todays_cloud_data_file_name()
#     TODO: implement code below
#     log.info("Cloud scraper module data file: %s "
#              "found will be used in "
#              " data_formater_module, last_cloud_file_name")
#     pass

# if todays_cloud_data_file_exist is False:
#     # check if local
#     sraper job did run today and save output file in "data" folder
vfedotovs commented 1 year ago

Resolved in 86d3c70