Closed jermnelson closed 1 year ago
@ahafele are there other reporting requirements that are needed?
@jermnelson This looks good for a basic bib load. It will be slightly different for Order loads and Invoice loads but the basic structure can stay the same. Could we include the Interface name and list of HRIDs?
@jermnelson and @ahafele based on our conversation this morning, it seems like we have access to the above information (outside of FOLIO via API) but not until we know the load completed.
I'm curious about our approach here, since that data is very likely to not be available during the data import dag run, should we have a reporting DAG that looks at files that are in the loaded
state and attempts to fetch that data, then send the email report? This could be an hourly (or any other schedule really) DAG run, and I suspect we can define conditions on the data to determine if we are ready to send a report.
If we take that approach, do we also want an email going out when either the data fetch is complete or the data processing dag is complete, knowing the the data import job is likely pending?
Thoughts?
Hi @aaron-collier, I think having separate reporting DAG (hourly would be fine) that checks for files in a loaded
state and then generates and emails report when the returned value from the API meets our criteria should be acceptable. I would be worried about user email fatigue if we sent emails three emails for each job (one for when the data fetch DAG is completed one for data processing DAG is completed, and then when one from the reporting DAG when the data import job is completed).
HI @aaron-collier thanks for this. An hourly (or other) schedule sounds good, but agree with @jermnelson that we probably don't need all 3 emails. For sure we can skip an email when a data processing DAG is completed. An email when a data fetch DAG is complete would be useful mostly in the "fetch only" scenarios we discussed yesterday. If it is easier to make that notification across all interfaces, then we can just deal with a little extra email.
@jermnelson / @ahafele - thanks, that helps. I'll update this ticket to be actionable, write the reporting dag ticket, and a ticket to report on fetch only interfaces as well..
Given a VendorFile, add an airflow task that will gather data used to format an email report, create the report content, and send the email report.
Initially this task should be a
Sensor
task that queries FOLIO until the data below is available and then sends the report. At that time, we can update the file status as well (can be a separate ticket).I've linked the API documentation below, you may find a better endpoint that provides better data (i.e. knowing the load is complete, has errors, etc... but for now the below is sufficient)
The report should provide at minimum:
Folio data can be fetched at the:
/metadata-provider/jobSummary/{jobExecutionId}
endpoint (https://s3.amazonaws.com/foliodocs/api/mod-source-record-manager/p/metadata-provider.html#metadata_provider_jobsummary__jobexecutionid__get) which produces output like:Note that more data may be available through other APIs, feel free to investigate, but so far only minimal data has been shown to be available from FOLIO.
Format of the email:
Subject: Vendor Name (Vendor code) - (file_name) - Report Body:
All emails will be sent to: folio-load-reports@lists.stanford.edu