umd-coding-workshop / taming-the-beast

ArchivesSpace Ingest Scripts
0 stars 2 forks source link

Sample - work in progress #5

Open alifabeta opened 10 years ago

alifabeta commented 10 years ago

For everyone watching this project, I've gotten an import into ArchivesSpace to work... sort of.

http://sandbox.archivesspace.org:8081/repositories/2/resources/28 (note that the ArchivesSpace sandbox gets reset periodically, so this link might be short-lived)

Compare to: http://hdl.handle.net/1903.1/2994

It imported once the

<extent>

tags were added inside the

<physdesc>

tags. But there are still lots of kinks to be ironed out. Short list of problems:

alifabeta commented 10 years ago

Okay, you can look at this: http://sandbox.archivesspace.org:8081/repositories/2/resources/32. Notice that it's much cleaner than the Bencriscutto FA linked above. I cleaned up the code manually to make sure that we're aiming for the correct results. I also created a pseudocode document and the beginnings of the python script. So now all we need to do is code.

ghost commented 10 years ago

@alifabeta, I see two potential areas of coding:

how much of the work above do you think is for the python program and how much for the EAD converter? It looks like you performed manual EAD modifications for your ArchivesSpace sandbox work. Is that correct?

jennielevineknies commented 10 years ago

@wallberg-umd - I am intrigued at the idea of modifying the Java extract code. Hadn't really thought of that approach before. However, since I am relatively familiar with it, I could do a little testing, playing around with requirements that Amanda works up. Ultimately, that might be a faster way to go. I'd need help making sure that the java code is set up correctly on my machine, etc. for testing. For example,@alifabeta - I can't remember exacly what the extent issue is, but I think it's along the lines of the current EAD is coded as:

<physdesc label="Size of the Collection" encodinganalog="300$a">140 items</physdesc>

and in ArchivesSpace it needs to be something like:

<physdesc><extent>140 items</extent></physdesc>

(I would ask if we need to retain these labels and marc encoding tags for ArchivesSpace - if not, we could easily remove them from the converter).

Anyway, a change like that involves modifying this file:

https://github.com/umd-lib/ead-db-convert/blob/develop/src/org/mith/ead/data/DataConvertor.java

And changing

didString = didString +"<physdesc label=\"Size of the Collection\" encodinganalog=\"300$a\">"+ rsArch.getString("physdesc")+"</physdesc>"; to didString = didString +"<physdesc><extent>"+ rsArch.getString("physdesc")+"</extent></physdesc>";


Voila! Any EAD you convert using the converter will be ArchivesSpace compliant.

So, those are the kinds of changes that are easy to make in the converter, and could probably save us a lot of time... I would need to sit with Ben in order to make the first official changes and overcome my fear of screwing everything up :)

ghost commented 10 years ago

I created new feature/ArchivesSpace branch for ead-db-convert, see https://github.com/umd-lib/ead-db-convert/tree/feature/ArchivesSpace