iati-data-access / data-backend

GNU Affero General Public License v3.0
1 stars 0 forks source link

Improve logging: alter to avoid filling logs with huge DB queries; include statements at start of each processing stage #26

Open simon-20 opened 8 months ago

simon-20 commented 8 months ago

Some of the normal, informational logging during the flask update stage of the backend processing (and the same might apply to other stages) prints a summary of the changes that will be made to the database. Sometimes this can mean that whole lists of IDs that will be updated will be printed to the logs, and this can mean that lists of tens/hundreds of thousands of IDs sometimes end up filling up the log files. Normally a run when not much has changed yields a log file of ~ 7Mb. But periodically (e.g. twice a week) the log file increases to 80-225 Mb.

The long lists of IDs or debug-style print statements of long SQL statements make dealing with the log files difficult, when there are genuine bugs to look for.

One place this occurs is the insert_or_update_rows function, which prints lists of IDs to be deleted that sometimes is thousands of items long.

simon-20 commented 8 months ago

Part of this task could also be adding a print statement to indicate what task is currently being carried out; e.g., something like, Flask update task started at DATETIME.

Currently all the output from all the stages is concatenated into one big log file, and it is hard to work out where things go wrong.

It would be good to improve the general structure and approach to logging, but adding a few statements indicating that each new stage is beginning would be a trivial step that would help with tracking down any bugs.