sul-dlss / libsys-airflow

Airflow DAGS for migrating and managing ILS data into FOLIO along with other LibSys workflows
Apache License 2.0
5 stars 0 forks source link

data structures to hand off to digital_bookplates_979 DAG when same po line has multiple bookplate funds #1322

Closed shelleydoljack closed 1 week ago

shelleydoljack commented 1 week ago

I'm not sure if this function will return a data structure that can be used for launch_add_979_fields task. An example invoice line UUID 39ba5b63-de98-4843-88ca-cc0f4adfcd23 has this data for its fundDistribution:

"fundDistributions": [
      {
          "code": "STEINMETZ-SUL",
          "encumbrance": "2e0765a0-b137-4d58-9e88-cc7394e1e5d7",
          "fundId": "fced6c53-61fb-48c6-821f-14cff3c39b59",
          "distributionType": "percentage",
          "value": 50.0
      },
      {
          "code": "WHITEHEAD-SUL",
          "encumbrance": "df477c4c-a324-4ffd-a999-8aff94743b29",
          "fundId": "694c5236-ada2-4d5f-8e12-ad61ddb4a426",
          "distributionType": "percentage",
          "value": 50.0
      }
  ],
  "invoiceId": "cf11f295-2363-4754-bccf-a614b9fc46c0",
  "invoiceLineNumber": "1",
  "invoiceLineStatus": "Paid",
  "poLineId": "ccee2e77-6c14-40b6-9e77-192b54576e34",

The bookplate_funds_polines task will presumably turn this invoice line into a data structure that looks like this:

[
      {
        "bookplate_metadata": { "druid": "abc123", "fund_name": "STEINMETZ", "fund_uuid": "fced6c53-61fb-48c6-821f-14cff3c39b59", "image_filename": "abc123.jpeg", "title": "" },
        "poline_id": "ccee2e77-6c14-40b6-9e77-192b54576e34"
      },
      {
        "bookplate_metadata": { "druid": "def456", "fund_name": "WHITEHEAD", "fund_uuid": "694c5236-ada2-4d5f-8e12-ad61ddb4a426", "image_filename": "def456.jpeg", "title": "" },
        "poline_id": "ccee2e77-6c14-40b6-9e77-192b54576e34"
      }
]

The next task in the digital_bookplate_instances DAG, instances_from_polines will get the instance ID by looking up the poline_id at /orders/order-lines endpoint. The instance ID is b6e80385-3be3-4e84-a530-d89d072af46b. Will instances_from_polines task turn the input data into the correct data structure for downstream tasks and DAGs? instances_from_polines says it returns a dict. Is it the same dict that is in the comment for launch_add_979_fields_task? It should look like this, I think:

{ "b6e80385-3be3-4e84-a530-d89d072af46b": [
        { "druid": "abc123", "fund_name": "STEINMETZ", "fund_uuid": "fced6c53-61fb-48c6-821f-14cff3c39b59", "image_filename": "abc123.jpeg", "title": "" },
        { "druid": "def456", "fund_name": "WHITEHEAD", "fund_uuid": "694c5236-ada2-4d5f-8e12-ad61ddb4a426", "image_filename": "def456.jpeg", "title": "" },
    ]
}

Need to add a test to https://github.com/sul-dlss/libsys-airflow/blob/main/tests/digital_bookplates/test_instances_from_po_lines.py to check the data passed between functions is as expected.

shelleydoljack commented 1 week ago

I used the Swagger UI to get a response of the xcomEntries for the dag run that was triggered with the configuration that included the STEINMETZ fund. dag_id = digital_bookplate_instances, dag_run_id = manual2024-10-24T00:49:58.080200+00:00, task_id = process-new-funds.instances_from_po_lines, limit = 236. The JSON response is available from the request URL https://sul-libsys-airflow-dev.stanford.edu/api/v1/dags/digital_bookplate_instances/dagRuns/manual2024-10-24T00%3A49%3A58.080200%2B00%3A00/taskInstances/process-new-funds.instances_from_po_lines/xcomEntries?limit=236

I searched the response for the instance ID b6e80385-3be3-4e84-a530-d89d072af46b in order to find the mapped task number: 18. For some reason, a GET to dags/{dag_id}/dagRuns/{dag_run_id}/taskInstances/{task_id}/xcomEntries/{xcom_key} where xcom_key is the instance ID "b6e80385-3be3-4e84-a530-d89d072af46b" returns a 404. No matter, I looked in the UI for mapped task 18. It shows and the XCOM object looks like this:

{'b6e80385-3be3-4e84-a530-d89d072af46b': [{'fund_name': 'STEINMETZ', 'druid': 'nc092rd1979', 'image_filename': 'nc092rd1979_00_0001.jp2', 'title': 'Verna Pace Steinmetz Endowed Book Fund in History'}]}

I realized that for the digital_bookplate_instances DAG when triggered by a fund config, the tasks to get bookplate_fund_polines will only look for the fund it was given, so in this case, the output of instances_from_po_lines will only include the bookplate metadata for the one fund.

However, when the digital_bookplate_instances DAG is run on its schedule, it should look for both STEINMENTZ and WHITEHEAD funds. I will try triggering the DAG with a fund config that includes these two funds to see what happens. I could try to trigger it with a data interval that includes the date in which the paid invoice line 39ba5b63-de98-4843-88ca-cc0f4adfcd23 would turn up, but I'm not sure how to set a dag run data interval start and end when manually triggered.

shelleydoljack commented 1 week ago

When I ran this locally with the funds config:

{
    "funds": [
        {
            "druid": "nc092rd1979",
            "fund_name": "STEINMETZ",
            "fund_uuid": "fced6c53-61fb-48c6-821f-14cff3c39b59",
            "image_filename": "nc092rd1979_00_0001.jp2",
            "title": "Verna Pace Steinmetz Endowed Book Fund in History"
        },
        {
            "druid": "ph944pq1002",
            "fund_name": "WHITEHEAD",
            "fund_uuid": "694c5236-ada2-4d5f-8e12-ad61ddb4a426",
            "image_filename": "ph944pq1002_00_0001.jp2",
            "title": "Barry Whitehead Memorial Book Fund"
        }
    ]
}

The process-funds-group went from 139 bookplate_funds_polines mapped tasks to 7 instances_from_po_lines: Screenshot 2024-10-24 at 3 18 34 PM The XCOMs for bookplate_funds_polines show return values that should be expected. Some with just STEINMETZ and some with just WHITEHEAD. I found the one for this po line ID ccee2e77-6c14-40b6-9e77-192b54576e34 and it looks like this:

[
  {
    "bookplate_metadata": {
      "fund_name": "STEINMETZ",
      "druid": "nc092rd1979",
      "image_filename": "nc092rd1979_00_0001.jp2",
      "title": "Verna Pace Steinmetz Endowed Book Fund in History"
    },
    "poline_id": "ccee2e77-6c14-40b6-9e77-192b54576e34"
  },
  {
    "bookplate_metadata": {
      "fund_name": "WHITEHEAD",
      "druid": "ph944pq1002",
      "image_filename": "ph944pq1002_00_0001.jp2",
      "title": "Barry Whitehead Memorial Book Fund"
    },
    "poline_id": "ccee2e77-6c14-40b6-9e77-192b54576e34"
  }
]

I'm not sure why the graph view shows only 7 instances_from_po_lines mapped tasks. When I click on the Mapped Tasks view, there are 139. Mapped task index 18 is the one for the po line I'm interested in. The XCOM output is:

{
  "b6e80385-3be3-4e84-a530-d89d072af46b": [
    {
      "fund_name": "STEINMETZ",
      "druid": "nc092rd1979",
      "image_filename": "nc092rd1979_00_0001.jp2",
      "title": "Verna Pace Steinmetz Endowed Book Fund in History"
    },
    {
      "fund_name": "WHITEHEAD",
      "druid": "ph944pq1002",
      "image_filename": "ph944pq1002_00_0001.jp2",
      "title": "Barry Whitehead Memorial Book Fund"
    }
  ]
}

Screenshot 2024-10-24 at 3 36 08 PM 😌 I'm glad to discover that this is as it should be (an instance ID with a list of bookplate metadata).