rossyndicate / poudre_sonde_network

MIT License
0 stars 7 forks source link

Checking for incoming_API_data dir and moving data from there to archive folder once pipeline is complete. #101

Closed juandlt-csu closed 7 months ago

juandlt-csu commented 7 months ago

Adding steps to verify incoming API data folder existence and a step to move the information from that folder to archive api data folder.

Two new targets are being made that call two new functions. Please check that the logic is sound and that it works properly.

  1. check_incoming_api_dir(): Checks if incoming and archive api data dirs exists and creates them if they don't. Also checks to make sure that incoming api data dir is empty and halts the pipeline if it isn't. If it's not empty something has gone wrong.
  2. clear_incoming_data_dir: At the end of the pipeline once everything else has been done this function will check if the incoming data is in the archive data and will move over things that are not already in the archive data folder.

The data that was in the incoming data folder is now in the archive data folder as a result of this pipeline. We will need a new naming convention for files since there will be more than one file for a date of data (site_date.csv vs site_datetime.csv).

An assumption that I am making is that if the incoming data folder is not empty when the pipeline starts up again every 3 hours then something went wrong. Right now that would break the pipeline, but that does not have to be the case, and if this encumbers progress on development we can remove that, but we should consider it for the future.

The way that I tested the same date (changing end_dt in incoming_data_csvs_upload to 11-29) was to tar_destroy() and then re-run the pipeline. There is probably a better solution to this. Will have to consider the implications of this in a live environment. When you are testing this I would suggest doing the same and clearing out the 11-29 files from the archive_api_data folder so that you can see it in action. I think this needs more robust testing still.