Closed jermnelson closed 1 year ago
ii. I think we only look at the date in the filename for marcit and we are not sure that is still needed. I can't think of why we would need to do that still but I will confirm.
iii. Sure, right now we only need this for 1 vendor - GOBI. Order files are built as each order comes in and we know they are done by the existence of a count file. Current processing looks for this corresponding .cnt first then ftps the .ord file.
Questions:
More questions:
More questions:
1. Is there existing code that we can look at?
https://drive.google.com/drive/folders/1hzgitvzgtyeaI-7-W0RNILwFCWySHC57?usp=share_link
- Where can we get credentials for vendors to try this out? GOBI is the exemplar vendor right now. details are here - https://docs.google.com/document/d/11JzLJb9kbW4u3dDLZ9_oFtsRtmnPkl0laS4zigQh5X0/edit#bookmark=id.8gbz5q30ybkw Waiting on approval from Darsi to hit the other vendors' ftp. Credentials will be stored in Organizations/interfaces.
@jermnelson could you address the other questions.
I'm going to work on a function for getting credentials from Folio.
an example from storytime:
question about "if the airflow app comes across a file, do we know about it, has it already been fetched?" still a bit unclear. should we explicitly track what we've gotten? is it implicit by what's already on a file system? should airflow care about not retrieving things we've seen, or does it get everything that fits the regex and then vendor mgmt app de-dupes... later? or airflow DAG writes to shared storage and just declines to overwrite anything that's already there? always grabs latest and there's always a new file, so not an issue?
answer: data import app tracks what has been processed. but still have above open question about coordination between airflow and data management app about e.g. how to not re-get something that's already been obtained. one possibility is that the airflow retrieval task or tasks see what files are available, asks the data management app which of them it should get (which might be none, if all have already been processed).
Storytime note: would like more clarity on what the workflow is at a software service level -- i.e. what the data mgmt app tracks, what the Airflow app is aware of, when they talk to one another. Airflow polls data mgmt app on a scheduled basis, gets work to do. But still a bit unclear on what state Airflow itself tracks, and how persistent it is.
More possibilities for tracking what's been retrieved:
possibly helpful for this ticket: https://github.com/sul-dlss/FOLIO-Project-Stanford/wiki/Vendor-Management---FOLIO---Airflow-Interaction-Diagram
I'm going to start with the simplest possible Task.
Given the inputs:
<shared mount>/files/<organization id>/<YYYY-MM-DD>/
.This has been supplanted by other tickets.
Extend SFTP to support downloading MARC and other files from a vendor.
From the Vendor Management App, retrieves connection details using the Organization's interface Okapi endpoint.
From Vendor data processing details requirement document, we need to be able to do the following: