Closed shelleydoljack closed 1 week ago
I used the Swagger UI to get a response of the xcomEntries for the dag run that was triggered with the configuration that included the STEINMETZ fund. dag_id = digital_bookplate_instances, dag_run_id = manual2024-10-24T00:49:58.080200+00:00, task_id = process-new-funds.instances_from_po_lines, limit = 236. The JSON response is available from the request URL https://sul-libsys-airflow-dev.stanford.edu/api/v1/dags/digital_bookplate_instances/dagRuns/manual2024-10-24T00%3A49%3A58.080200%2B00%3A00/taskInstances/process-new-funds.instances_from_po_lines/xcomEntries?limit=236
I searched the response for the instance ID b6e80385-3be3-4e84-a530-d89d072af46b
in order to find the mapped task number: 18. For some reason, a GET to dags/{dag_id}/dagRuns/{dag_run_id}/taskInstances/{task_id}/xcomEntries/{xcom_key}
where xcom_key is the instance ID "b6e80385-3be3-4e84-a530-d89d072af46b" returns a 404. No matter, I looked in the UI for mapped task 18. It shows and the XCOM object looks like this:
{'b6e80385-3be3-4e84-a530-d89d072af46b': [{'fund_name': 'STEINMETZ', 'druid': 'nc092rd1979', 'image_filename': 'nc092rd1979_00_0001.jp2', 'title': 'Verna Pace Steinmetz Endowed Book Fund in History'}]}
I realized that for the digital_bookplate_instances DAG when triggered by a fund config, the tasks to get bookplate_fund_polines will only look for the fund it was given, so in this case, the output of instances_from_po_lines will only include the bookplate metadata for the one fund.
However, when the digital_bookplate_instances DAG is run on its schedule, it should look for both STEINMENTZ and WHITEHEAD funds. I will try triggering the DAG with a fund config that includes these two funds to see what happens. I could try to trigger it with a data interval that includes the date in which the paid invoice line 39ba5b63-de98-4843-88ca-cc0f4adfcd23
would turn up, but I'm not sure how to set a dag run data interval start and end when manually triggered.
When I ran this locally with the funds config:
{
"funds": [
{
"druid": "nc092rd1979",
"fund_name": "STEINMETZ",
"fund_uuid": "fced6c53-61fb-48c6-821f-14cff3c39b59",
"image_filename": "nc092rd1979_00_0001.jp2",
"title": "Verna Pace Steinmetz Endowed Book Fund in History"
},
{
"druid": "ph944pq1002",
"fund_name": "WHITEHEAD",
"fund_uuid": "694c5236-ada2-4d5f-8e12-ad61ddb4a426",
"image_filename": "ph944pq1002_00_0001.jp2",
"title": "Barry Whitehead Memorial Book Fund"
}
]
}
The process-funds-group went from 139 bookplate_funds_polines mapped tasks to 7 instances_from_po_lines:
The XCOMs for bookplate_funds_polines show return values that should be expected. Some with just STEINMETZ and some with just WHITEHEAD. I found the one for this po line ID ccee2e77-6c14-40b6-9e77-192b54576e34
and it looks like this:
[
{
"bookplate_metadata": {
"fund_name": "STEINMETZ",
"druid": "nc092rd1979",
"image_filename": "nc092rd1979_00_0001.jp2",
"title": "Verna Pace Steinmetz Endowed Book Fund in History"
},
"poline_id": "ccee2e77-6c14-40b6-9e77-192b54576e34"
},
{
"bookplate_metadata": {
"fund_name": "WHITEHEAD",
"druid": "ph944pq1002",
"image_filename": "ph944pq1002_00_0001.jp2",
"title": "Barry Whitehead Memorial Book Fund"
},
"poline_id": "ccee2e77-6c14-40b6-9e77-192b54576e34"
}
]
I'm not sure why the graph view shows only 7 instances_from_po_lines mapped tasks. When I click on the Mapped Tasks view, there are 139. Mapped task index 18 is the one for the po line I'm interested in. The XCOM output is:
{
"b6e80385-3be3-4e84-a530-d89d072af46b": [
{
"fund_name": "STEINMETZ",
"druid": "nc092rd1979",
"image_filename": "nc092rd1979_00_0001.jp2",
"title": "Verna Pace Steinmetz Endowed Book Fund in History"
},
{
"fund_name": "WHITEHEAD",
"druid": "ph944pq1002",
"image_filename": "ph944pq1002_00_0001.jp2",
"title": "Barry Whitehead Memorial Book Fund"
}
]
}
😌 I'm glad to discover that this is as it should be (an instance ID with a list of bookplate metadata).
I'm not sure if this function will return a data structure that can be used for launch_add_979_fields task. An example invoice line UUID
39ba5b63-de98-4843-88ca-cc0f4adfcd23
has this data for its fundDistribution:The bookplate_funds_polines task will presumably turn this invoice line into a data structure that looks like this:
The next task in the digital_bookplate_instances DAG, instances_from_polines will get the instance ID by looking up the poline_id at
/orders/order-lines
endpoint. The instance ID isb6e80385-3be3-4e84-a530-d89d072af46b
. Will instances_from_polines task turn the input data into the correct data structure for downstream tasks and DAGs? instances_from_polines says it returns a dict. Is it the same dict that is in the comment for launch_add_979_fields_task? It should look like this, I think:Need to add a test to https://github.com/sul-dlss/libsys-airflow/blob/main/tests/digital_bookplates/test_instances_from_po_lines.py to check the data passed between functions is as expected.