Open cwulfman opened 2 years ago
Approach:
Method:
e6080f72-4ba5-4e35-a87d-9a4eb826d0e3
from GUIimport json
import csv
import re
import os
tsv_path = "/home/deploy/pppl//records.tsv"
json_path = "/home/deploy/pppl/records.json"
isilon_path ="/mnt/hydra_sources/ingest_scratch/pppl_technical_reports/staged"
collection_id ="e6080f72-4ba5-4e35-a87d-9a4eb826d0e3"
records = []
with open(tsv_path) as f:
reader = csv.DictReader(f, delimiter="\t")
for row in reader:
record = {}
record['title'] = row['title'].strip()
record['local_identifier'] = row['call number'].strip()
record['path'] = os.path.join(isilon_path, row['call number'].strip())
input_authors = row['authors'].split(',')
authors = []
for author in input_authors:
author = author.strip()
author = re.sub(r"^and ", "", author)
authors.append(author)
record['creator'] = authors
record['member_of_collection_ids'] = [collection_id]
records.append(record)
output = {}
output['records'] = records
with open(json_path, 'w') as fp:
json.dump(output, fp)
bundle exec rake figgy:import:json FILE=/home/deploy/pppl/records.json
I see these in Figgy as SimpleResources — but I thought the plan was to create MARC records for the metadata, since the expectation was that these would be available in Orangelight. Was there a change of plan? Was Anya Bartelmann OK with the change?
I have ingested the items in https://docs.google.com/spreadsheets/d/1lApC21IrsX3BeXyyx0nT8LcW7SQEawyCf4tCsIbQ7co/edit#gid=1262367616 as SimpleResources into https://figgy.princeton.edu/catalog/e6080f72-4ba5-4e35-a87d-9a4eb826d0e3 using the attached JSON file, which comprises data extracted from the spreadsheet tab “new records”.
What I need from @pmgreen is a mapping of local_identifier -> source_identifier.
When I have that, I’ll re-ingest the reports as ScannedResources with the updated metadata, and then send you a report with a mapping of source_identifier->ark, so you can update the MARC records. records.json.zip
From @escowles