mjordan / islandora_workbench

A command-line tool for managing content in an Islandora 2 repository
MIT License
24 stars 41 forks source link

Error during ingest on a multi-domain site: KeyError:'' #830

Closed dara2 closed 5 days ago

dara2 commented 5 days ago

I’m running WB on latest main (on a site with multiple domains) and it passes --check, but then I get this error as soon as I run the ingest:

islandora@pitt-staging:/mnt/ingest/islandora_workbench$ time ./workbench --config create_citydirectory_staging.yml
OK, connection to Drupal at https://i2-staging.digital.library.pitt.edu/ verified.
"Create" task started using config file create_citydirectory_staging.yml.
Only node IDs for parents created during this session will be used (not using the CSV ID to node ID map).
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /mnt/ingest/islandora_workbench/./workbench:3645 in <module>                 │
│                                                                              │
│   3642                                                                       │
│   3643 try:                                                                  │
│   3644 │   if config["task"] == "create":                                    │
│ ❱ 3645 │   │   create()                                                      │
│   3646 │   if config["task"] == "update":                                    │
│   3647 │   │   update()                                                      │
│   3648 │   if config["task"] == "delete":                                    │
│                                                                              │
│ /mnt/ingest/islandora_workbench/./workbench:300 in create                    │
│                                                                              │
│    297 │   │   │   # workbench_fields.py, they need to be registered in the  │
│    298 │   │   │                                                             │
│    299 │   │   │   # Entity reference fields (taxonomy_term and node).       │
│ ❱  300 │   │   │   if field_definitions[custom_field]["field_type"] == "enti │
│    301 │   │   │   │   entity_reference_field = workbench_fields.EntityRefer │
│    302 │   │   │   │   node = entity_reference_field.create(                 │
│    303 │   │   │   │   │   config, field_definitions, node, row, custom_fiel │
╰──────────────────────────────────────────────────────────────────────────────╯
KeyError: ''

(This is the site with multiple domains where ingest was working fine when I manually made the fix from https://github.com/mjordan/islandora_workbench/issues/796 in my local version of workbench_fields.py.)

Config:

task: create
host: "xxx"
username: xxx
password: xxx
input_dir: /mnt/ingest/31735056286804
input_csv: 31735056286804_cityDirectory.csv
output_csv: 31735056286804_cityDirectory-output.csv
output_csv_include_input_csv: true
allow_missing_files: true
allow_adding_terms: true
# validate_title_length: false
perform_soft_checks: true
standalone_media_url: true
text_format_id: full_html
# field_for_media_title: field_pid
# field_for_remote_filename: field_pid
delete_tmp_upload: true
adaptive_pause: 2
adaptive_pause_threshold: 2.5 
log_term_creation: false
http_cache_storage: memory
http_cache_storage_expire_after: 600
# query_csv_id_to_node_id_map_for_parents: false
# additional_files:
# - extracted_text: 2
# - thumbnail: 8
# - transcript: 9
# - hocr: 41
# - PDF: 57

Part of the CSV:

id,field_pid,parent_id,field_weight,field_model,field_resource_type,field_genre,field_related_title,field_related_title_part_of,field_related_title_preceding,field_related_title_succeeding,field_related_title_constituent,field_related_title_original,field_related_title_other_format,field_related_title_other_versio,field_related_title_referenced,field_full_title,title,field_linked_agent,field_publisher_pitt,field_place_production,field_distributor,field_local_identifier,field_music_publisher_number,field_place_manufacture,field_manufacturer,field_table_of_contents,field_source_collection,field_physical_form,field_uniform_title,field_extent,field_statement_of_resp,field_subject_genre,field_subject,field_language,field_temporal_subject,field_muscomp_genre,field_place_publication,field_producer,field_edition,field_description_long,field_depositor,field_mode_of_issuance,field_geographic_subject,field_copyright_date,field_alternative_title,,field_oclc_number,field_subjects_name,field_place_distribution,field_record_source_id,field_note,field_place_published_pitt,field_edtf_date,field_coordinates_text,field_isbn,field_copyright_holder,field_rights_notes,field_place_published_country,field_frequency,field_issn,field_preservica_date,field_preservica_id,field_rights_statement,field_scale,field_source_citation,field_source_collection_id,field_source_location,field_source_repository,field_subject_title,field_access_terms,field_member_of,field_display_hints,file,field_domain_access
pitt:31735056286804,pitt:31735056286804,,,Paged Content,Collection,,Polk's Pittsburgh city directory. OCLCN: (OCoLC)7150436|R.L. Polk & Co.'s Pittsburgh city directory.|Polk's Pittsburgh (Pennsylvania) city directory. OCLCN: (OCoLC)7150527,"R.L. Polk & Co. Polk's Pittsburgh city directory. Pittsburgh, Pa: R.L. Polk & Co., 1922-1926.",,,,,,,,Polk's Pittsburgh city directory: 1924,Polk's Pittsburgh city directory:1924,relators:att:corporate_body:Pittsburgh History & Landmarks Foundation,,,,,,,,,,print,,5 v. ; 27 cm.,,Directories,,eng,,,,,,,Pittsburgh History & Landmarks Foundation,continuing,Pittsburgh (Pa.),,,No Copyright - United States,,,,548271,,pau,1924,,,,,pau,,,,,,,,,,,,,1198,Mirador,,digital_library_pitt_edu|historicpittsburgh_org
pitt:31735056286804-0001,pitt:31735056286804-0001,pitt:31735056286804,1,Page,Text,,,,,,,,,,,,"Polk's Pittsburgh city directory:1924, page 1",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Mirador,pitt_31735056286804-0001.JP2.jp2,digital_library_pitt_edu|historicpittsburgh_org
pitt:31735056286804-0002,pitt:31735056286804-0002,pitt:31735056286804,2,Page,Text,,,,,,,,,,,,"Polk's Pittsburgh city directory:1924, page 2",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Mirador,pitt_31735056286804-0002.JP2.jp2,digital_library_pitt_edu|historicpittsburgh_org
pitt:31735056286804-0003,pitt:31735056286804-0003,pitt:31735056286804,3,Page,Text,,,,,,,,,,,,"Polk's Pittsburgh city directory:1924, page 3",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Mirador,pitt_31735056286804-0003.JP2.jp2,digital_library_pitt_edu|historicpittsburgh_org
pitt:31735056286804-0004,pitt:31735056286804-0004,pitt:31735056286804,4,Page,Text,,,,,,,,,,,,"Polk's Pittsburgh city directory:1924, page 4",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Mirador,pitt_31735056286804-0004.JP2.jp2,digital_library_pitt_edu|historicpittsburgh_org
pitt:31735056286804-0005,pitt:31735056286804-0005,pitt:31735056286804,5,Page,Text,,,,,,,,,,,,"Polk's Pittsburgh city directory:1924, page 5",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Mirador,pitt_31735056286804-0005.JP2.jp2,digital_library_pitt_edu|historicpittsburgh_org
dara2 commented 5 days ago

This seemed to be caused by a column without a header in my CSV. Odd that --check didn't flag that.