Open vfedotovs opened 1 month ago
More triage is needed
2024-10-08 00:50:10,247 [WARNI] : cloud_data_formater_main: 125: Lambda scraped raw-data file does not exist, failing back to local scraper source file
WARNING:data_format_changer:Lambda scraped raw-data file does not exist, failing back to local scraper source file
2024-10-08 00:50:10,247 [INFO ] : cloud_data_formater_main: 127: Converting to csv format from local scraped raw-data file: data/Ogre-raw-data-report-2024-10-08.txt format
INFO:data_format_changer:Converting to csv format from local scraped raw-data file: data/Ogre-raw-data-report-2024-10-08.txt format
2024-10-08 00:50:10,248 [INFO ] : create_oneline_report: 192: Converting raw-text 12 lines per entry fromat into 1 line per entry csv file format
INFO:data_format_changer:Converting raw-text 12 lines per entry fromat into 1 line per entry csv file format
2024-10-08 00:50:10,248 [INFO ] : create_oneline_report: 194: Reading data from file : data/Ogre-raw-data-report-2024-10-08.txt
INFO:data_format_changer:Reading data from file : data/Ogre-raw-data-report-2024-10-08.txt
2024-10-08 00:50:10,249 [ERROR] : create_oneline_report: 264: Source raw-data text file: data/Ogre-raw-data-report-2024-10-08.txt does not exist
Root couse ogre_city_data_frame is None and file write/creation fails
2024-10-08 00:50:10,249 [ERROR] : cloud_data_formater_main: 137: ogre_city_data_frame is None
2024-10-08 00:50:10,250 [ERROR] : cloud_data_formater_main: 138: Saving csv format data file pandas_df.csv has failed
2024-10-08 00:50:10,250 [INFO ] : cloud_data_formater_main: 139: --- Finished data_format_changer module ---
2024-10-08 00:50:10,250 [INFO ] : run_long_task: 101: Running df_cleaner_main task: using locally scraped file
2024-10-08 00:50:10,251 [INFO ] : df_cleaner_main: 323: --- Started df_cleaner module ---
2024-10-08 00:50:10,252 [INFO ] : df_cleaner_main: 329: Loading pandas_df.csv file.
pandas_df.csv was not created by previous module
2024-10-08 00:50:10,252 [ERROR] : df_cleaner_main: 356: File pandas_df.csv not found
2024-10-08 00:50:10,252 [INFO ] : df_cleaner_main: 358: Loading pandas_df_default.csv file.
Default template file is missing
2024-10-08 00:50:10,253 [ERROR] : df_cleaner_main: 368: pandas_df_default.csv does not exist.
2024-10-08 00:50:10,253 [INFO ] : df_cleaner_main: 371: --- Completed df_cleaner module ---
2024-10-08 00:50:10,254 [INFO ] : run_long_task: 103: Running db_worker_main task: using locally scraped file
INFO:fastapi:Running db_worker_main task: using locally scraped file
INFO:db_worker: --- Satrting db_worker module ---
INFO:db_worker:Checking if required module file cleaned-sorted-df.csv exits in /
ERROR:db_worker:There was an error opening the file cleaned-sorted-df.csv or file does not exist!
ERROR: Exception in ASGI application
Traceback (most recent call last):
Crash because db_worker.py does not handle gracefuly missiun file
ERROR:db_worker:There was an error opening the file cleaned-sorted-df.csv or file does not exist!
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/app/wsmodules/db_worker.py", line 109, in check_files
file = open(file_name, 'r')
FileNotFoundError: [Errno 2] No such file or directory: 'cleaned-sorted-df.csv'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
--- cut ---
File "/usr/local/lib/python3.8/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
return await dependant.call(**values)
File "/app/main.py", line 104, in run_long_task
db_worker_main()
File "/app/wsmodules/db_worker.py", line 59, in db_worker_main
check_files(requred_files)
File "/app/wsmodules/db_worker.py", line 115, in check_files
sys.exit()
SystemExit
INFO: 192.168.144.4:36690 - "GET /run-task/ogre HTTP/1.1" 500 Internal Server Error
<< seems file copy failed
2024-10-08 00:50:10,239 [INFO ] web_scraper : scrape_website: 89: Creating file Ogre-raw-data-report.txt copy in data folder
<< no check
2024-10-08 00:50:10,239 [INFO ] web_scraper : scrape_website: 91: --- Finished web_scraper module ---
<< data formater could not create file pandas_df.csv becasue preivios module did not copy file to data folder
2024-10-08 00:50:10,248 [INFO ] : create_oneline_report: 194: Reading data from file : data/Ogre-raw-data-report-2024-10-08.txt
2024-10-08 00:50:10,249 [ERROR] : create_oneline_report: 264: Source raw-data text file: data/Ogre-raw-data-report-2024-10-08.txt does not exist
2024-10-08 00:50:10,249 [ERROR] : cloud_data_formater_main: 137: ogre_city_data_frame is None
2024-10-08 00:50:10,250 [ERROR] : cloud_data_formater_main: 138: Saving csv format data file pandas_df.csv has failed
Affected function:
def create_file_copy() -> None:
"""Creates report file copy in data folder"""
todays_date = datetime.today().strftime('%Y-%m-%d')
dest_file = 'Ogre-raw-data-report-' + todays_date + '.txt'
copy_cmd = 'cp Ogre-raw-data-report.txt local_lambda_raw_scraped_data/' + dest_file
if not os.path.exists('local_lambda_raw_scraped_data'):
os.makedirs('local_lambda_raw_scraped_data')
os.system(copy_cmd)
Proposed improvements
To debug why the file copy might not have completed for the create_file_copy function, you can enhance the function with better error handling, logging, and system call verification. The os.system() call is not the best way to execute shell commands since it doesn't provide clear error messages. Instead, you can use Python's shutil.copy for file operations, or subprocess for executing shell commands, as it provides better control and error reporting.
Steps to Improve and Debug the Function: Use shutil.copy instead of os.system() for copying the file, which is more reliable for Python file operations. Add exception handling to capture and log any errors during file copy. Check if the source file exists before attempting the copy. Add logging to track each step and any errors. Here’s an updated version of the function with these improvements:
import os
import shutil
import logging
from datetime import datetime
# Set up basic logging
logging.basicConfig(level=logging.INFO)
log = logging.getLogger(__name__)
def create_file_copy() -> None:
"""Creates a report file copy in the data folder with error handling and logging."""
try:
# Set up date and destination file path
todays_date = datetime.today().strftime('%Y-%m-%d')
src_file = 'Ogre-raw-data-report.txt'
dest_file = f'local_lambda_raw_scraped_data/Ogre-raw-data-report-{todays_date}.txt'
# Check if source file exists
if not os.path.exists(src_file):
log.error(f"Source file '{src_file}' does not exist.")
return
# Ensure the destination directory exists
if not os.path.exists('local_lambda_raw_scraped_data'):
os.makedirs('local_lambda_raw_scraped_data')
log.info(f"Created directory 'local_lambda_raw_scraped_data'.")
# Perform the file copy
shutil.copy(src_file, dest_file)
log.info(f"Copied '{src_file}' to '{dest_file}' successfully.")
except Exception as e:
log.error(f"An error occurred during the file copy: {e}")
Using shutil.copy() is more reliable than os.system() for file operations in Python for several reasons:
Scrape job is triggered at London time -1 hour
Local scrape job completed with success after 10 min