Open alifabeta opened 10 years ago
Okay, you can look at this: http://sandbox.archivesspace.org:8081/repositories/2/resources/32. Notice that it's much cleaner than the Bencriscutto FA linked above. I cleaned up the code manually to make sure that we're aiming for the correct results. I also created a pseudocode document and the beginnings of the python script. So now all we need to do is code.
@alifabeta, I see two potential areas of coding:
how much of the work above do you think is for the python program and how much for the EAD converter? It looks like you performed manual EAD modifications for your ArchivesSpace sandbox work. Is that correct?
@wallberg-umd - I am intrigued at the idea of modifying the Java extract code. Hadn't really thought of that approach before. However, since I am relatively familiar with it, I could do a little testing, playing around with requirements that Amanda works up. Ultimately, that might be a faster way to go. I'd need help making sure that the java code is set up correctly on my machine, etc. for testing. For example,@alifabeta - I can't remember exacly what the extent issue is, but I think it's along the lines of the current EAD is coded as:
<physdesc label="Size of the Collection" encodinganalog="300$a">140 items</physdesc>
and in ArchivesSpace it needs to be something like:
<physdesc><extent>140 items</extent></physdesc>
(I would ask if we need to retain these labels and marc encoding tags for ArchivesSpace - if not, we could easily remove them from the converter).
Anyway, a change like that involves modifying this file:
https://github.com/umd-lib/ead-db-convert/blob/develop/src/org/mith/ead/data/DataConvertor.java
And changing
didString = didString +"<physdesc label=\"Size of the Collection\" encodinganalog=\"300$a\">"+ rsArch.getString("physdesc")+"</physdesc>";
to
didString = didString +"<physdesc><extent>"+ rsArch.getString("physdesc")+"</extent></physdesc>";
Voila! Any EAD you convert using the converter will be ArchivesSpace compliant.
So, those are the kinds of changes that are easy to make in the converter, and could probably save us a lot of time... I would need to sit with Ben in order to make the first official changes and overcome my fear of screwing everything up :)
I created new feature/ArchivesSpace branch for ead-db-convert, see https://github.com/umd-lib/ead-db-convert/tree/feature/ArchivesSpace
For everyone watching this project, I've gotten an import into ArchivesSpace to work... sort of.
http://sandbox.archivesspace.org:8081/repositories/2/resources/28 (note that the ArchivesSpace sandbox gets reset periodically, so this link might be short-lived)
Compare to: http://hdl.handle.net/1903.1/2994
It imported once the
tags were added inside the
tags. But there are still lots of kinks to be ironed out. Short list of problems: