mjordan / islandora_workbench

A command-line tool for managing content in an Islandora 2 repository
MIT License
24 stars 41 forks source link

Running Workbench when using a Google sheet as input CSV fails unless --check has been run #496

Closed mjordan closed 1 year ago

mjordan commented 1 year ago

More detail in this Slack thread. Thanks to @ruebot for finding this.

mjordan commented 1 year ago

@ruebot can you pull in updates to the main branch and try again?

ruebot commented 1 year ago

I get this now:

Traceback (most recent call last):
  File "/home/nruest/Projects/york/islandora_workbench/workbench", line 1266, in <module>
    csv_data_to_count = get_csv_data(config)
  File "/home/nruest/Projects/york/islandora_workbench/workbench_utils.py", line 2997, in get_csv_data
    for item in csv_reader_fieldnames:
TypeError: 'NoneType' object is not iterable

If I do --check then, run the import, it works fine.

mjordan commented 1 year ago

When it doesn't work (i.e. when you don't run --check first), 1) does /tmp/google_sheet.csv exist and 2) if so, what is in it? Can you share /tmp/google_sheet.csv here, or in an email to me if you'd rather not post it? Also, is there a /tmp/google_sheet.csv.preprocessed?

ruebot commented 1 year ago

It's the first option, /tmp/google_sheet.csv doesn't exist. It looks like the --check option must create it at some point. On my production machine, which is still sitting at commit 51c0f79d1683104aa262e0431ebb411f535f0243, it works. So, I guess at some point between that commit and now, that functionality changed.

mjordan commented 1 year ago

Also, can you confirm that your Google Sheet is publicly accessible (i.e. not requiring a logged in Google user to view it)?

ruebot commented 1 year ago

yep, it's public.

Here's my conifg:

task: create
host: "http://localhost:8000"
username: admin
password: islandora
input_dir: /tmp
input_csv: 'https://docs.google.com/spreadsheets/d/14X-rQ-Hz3lN6l_KTkBwnzxVICkFbgLV3XQoIO1r6JM8/edit?usp=sharing'
google_sheets_gid: 0
log_file_path: /tmp/collection.log
drupal_filesystem: "fedora://"
id_field: id
nodes_only: true
mjordan commented 1 year ago

Yes, --check does dump the Sheet's CSV to check for required fields, etc. I'll try to track down what changed between and now.

mjordan commented 1 year ago

The plot thickens. Using this config, I can ingest your nodes. I'm running workbench in the most recent commit in main:

task: create
host: http://localhost:8000
username: admin
password: islandora
input_dir: /tmp
input_csv: 'https://docs.google.com/spreadsheets/d/14X-rQ-Hz3lN6l_KTkBwnzxVICkFbgLV3XQoIO1r6JM8/edit?usp=sharing'
google_sheets_gid: 0
nodes_only: true
# Settings below are necessary to accommodate differences in configuration
# between your Drupal and mine but should have no impact on this issue.
ignore_csv_columns: ['field_model', 'field_pid', 'uid', 'field_note', 'field_resource_type']
csv_field_templates:
 - field_model: 24

Before running workbench, I made sure /tmp/google_sheets.csv doesn't exist. This works with and without --check. Here's the ouput:

mark@user-ThinkPad-X1-Carbon-6th:~/hacking/islandora_workbench$ ./workbench --config issue-496.yml
OK, connection to Drupal at http://localhost:8000 verified.
"Create" task started using config file issue-496.yml.
"nodes_only" option in effect. No media will be created.
Node for "Academic Innovation Fund - Digital Humanities and Social Sciences" (record yul:oru) created at http://localhost:8000/node/541.
Node for "Adapting Canadian Work and Workplaces to Respond to Climate Change" (record yul:336565) created at http://localhost:8000/node/542.
Node for "Buddhism Across Boundaries" (record yul:buddhism-across-boundaries) created at http://localhost:8000/node/543.
Node for "Clara Thomas Archives & Special Collections" (record yul:asc) created at http://localhost:8000/node/544.
Node for "Dworin Collection" (record yul:600882) created at http://localhost:8000/node/545.
Node for "John Holmes Library Collection" (record yul:894110) created at http://localhost:8000/node/546.
Node for "Map Library" (record yul:1120416) created at http://localhost:8000/node/547.
Node for "Open Scholarship" (record yul:os) created at http://localhost:8000/node/548.
Node for "Ojibwe Cultural Foundation (OCF)" (record yul:ocf) created at http://localhost:8000/node/549.
Node for "Rare classic Chinese literature" (record yul:573282) created at http://localhost:8000/node/550.
Node for "Sheila Thibaudeau Lambrinos Collection" (record yul:573280) created at http://localhost:8000/node/551.
Node for "Sound and Moving Image Library (SMIL)" (record yul:smil) created at http://localhost:8000/node/552.
Node for "Yolton Library Rare Book Collection" (record yul:308601) created at http://localhost:8000/node/553.
Node for "York Digital Journals" (record yul:ydj) created at http://localhost:8000/node/554.
Node for "York University Libraries Monographs" (record yul:ia) created at http://localhost:8000/node/555.https://github.com/mjordan/islandora_workbench/commit/51c0f79d1683104aa262e0431ebb411f535f0243
Node for "YorkSpace Streaming" (record yul:770320) created at http://localhost:8000/node/556.
Node for "Las Nubes" (record yul:lasnubes) created at http://localhost:8000/node/557.

I'll continue to dig into what has changed since https://github.com/mjordan/islandora_workbench/commit/51c0f79d1683104aa262e0431ebb411f535f0243 that might cause the error on your dev/test box.

mjordan commented 1 year ago

Or another test worth trying would be to use my config, but replace - field_model: 24 with - field_model: 11. If it creates nodes, they wouldn't have values in 'field_pid', 'uid', 'field_note', 'field_resource_type'.

ruebot commented 1 year ago

Cool. Just this works fine:

task: create
host: http://localhost:8000
username: admin
password: islandora
input_dir: /tmp
input_csv: 'https://docs.google.com/spreadsheets/d/14X-rQ-Hz3lN6l_KTkBwnzxVICkFbgLV3XQoIO1r6JM8/edit?usp=sharing'
google_sheets_gid: 0
nodes_only: true

Looks like the real issue is that I have a bunch of legacy cruft I need to delete from my config files. We can mark this as closed then. Thanks for taking the time to troubleshoot this @mjordan. I really appreciate it!

mjordan commented 1 year ago

Super, I'm glad it's working. Was turning into a head-scratcher! Closing but if anything else doesn't work the way you expect, let me know.