Open dorevabelfiore-temple opened 8 years ago
Doreva & Leanne will assess the XML first and then schedule a meeting with Chad once we have some test data.
Doreva is working with some institutions: UPITT Historic Pittsburgh, Presbyterian Historical Society, maybe Lafayette College. APS not working. Drexel DUCOM not ready yet. Drexel main campus may not have in-scope content.
We have OAI data from UPITT and PHS and @lfinnigan and I are reviewing it.
Our team reviewed the data and created a draft mapping: See 2nd tab "Field Mapping" here : https://docs.google.com/spreadsheets/d/1pDDoayXkO71UrvmG9czlR01qDau678IyYicmWqAZJ5c/edit#gid=205346128
Let us know if there are any questions! Thanks!
Chad adjusted the thumbnail path to the master path for Islandora instances.
Started work on this issue in Islandora harvests branch. Harvests work, as expected, but not creating thumbnail.
To reproduce, create a new seed with these details and run a harvest.
Name: Historic Pittsburgh Endpoint url: http://historicpittsburgh.org/oai2 Metadata prefix: oai_dc Set: pitt_collection.33 Collection Name: Aerial Photographs of Pittsburgh Contributing Institution: Pitt Intermediate Provider: Common Repository Type (if applicable): Islandora Thumbnail Pattern (if applicable): Thumbnail Token 1 (if applicable): Thumbnail Token 2 (if applicable): Provider ID Prefix: PITT
@lfinnigan @dorevabelfiore-temple @skng5
Some records in the original OAI feed have
<dc:identifier>pitt:886.18159.AP</dc:identifier>
<dc:identifier>pitt: 886.18159.AP</dc:identifier>
I've encountered this issue in pitt_collection.15 as well, so I'm assuming it is a recurrent issue throughout the collection.
Do you want me to try to compensate for this, or have them fix their data?
This is test test case for why #130 is important. I would tell them to fix their data, since this looks obviously wrong. Chad, we tested 20 collections in our first pass. These are the indicated ones with names on the first tab of the sheet. Can you focus on those colls first in your test? If you need more I can assess a few more on Monday.
Confirming that this will be our primary harvesting work for Spring (LSTA Q2 grant = January - March 2017).
Sent email to Historic Pittsburgh to investigate.
As of 1/13 UPITT is working on this. More anon.
Chad says these are some type of unicode characters that are causing the issue.
Code pushed to DEV & can be tested.
5 collections tested in DEV. 3 worked fine. 2 stopped due to UTF-8 encoding problems. Here are the resque errors:
id: 322 name: Frick Collection Frick Business Records description: endpoint_url: http://historicpittsburgh.org/oai2 metadata_prefix: oai_dc set: pitt_collection.156 contributing_institution: Frick Collection collection_name: Henry Clay Frick Business Records created_at: '2017-03-09T15:05:03.237Z' updated_at: '2017-03-09T15:05:03.237Z' set_spec: in_production: 'No' new_contributing_institution: Frick Collection email: '' provider_id_prefix: FRICK new_provider_id_prefix: FRICK new_endpoint_url: '' common_repository_type: Islandora thumbnail_pattern: '' thumbnail_token_1: '' thumbnail_token_2: '' thumbnail_explanation: common_transformation: intermediate_provider: Historic Pittsburgh new_intermediate_provider: '' new_email: '' rights_statement: '' identifier_pattern: '' identifier_token: '' types_mapping: type_image: '' type_text: '' type_moving_image: '' type_sound: '' type_physical_object: '' contributing_institution_dc_field: '' last_harvested: '' Exception Encoding::CompatibilityError Error incompatible character encodings: ASCII-8BIT and UTF-8
We pushed a change that should trap and quarantine these incompatible character encoding files. So please try ingesting these collections again.
Tested 2 and they looked good. Running another 2 tests today.
On Mon, Mar 20, 2017 at 2:42 PM, Chad Nelson notifications@github.com wrote:
We pushed a change that should trap and quarantine these incompatible character encoding files. So please try ingesting these collections again.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tulibraries/dplah/issues/134#issuecomment-287858455, or mute the thread https://github.com/notifications/unsubscribe-auth/AQOsq9DVutsKMDAlvMg2k5-E4ruyQ_dJks5rnsibgaJpZM4J39Gv .
-- Doreva Belfiore
Digital Projects Librarian Co-Project Manager Digital Library Initiatives PA Digital Temple University Libraries www.padigital.org 215-204-4942 (P) info@padigital.org 215-204-3681 (F)
I am not seeing any more encoding errors in the Resque. Thanks!
What we are seeing are unrelated issues that are now reported as issues
thumbnails. The thumbnail had been working but when I reingest it now is not.
Unfortunately, we are finding some identifiers that are duplicated among collections as well. :-(
Gabe & Rachel have more info.
Thanks!
--Doreva
On Tue, Mar 21, 2017 at 8:19 AM, Doreva Belfiore tue50858@temple.edu wrote:
Tested 2 and they looked good. Running another 2 tests today.
On Mon, Mar 20, 2017 at 2:42 PM, Chad Nelson notifications@github.com wrote:
We pushed a change that should trap and quarantine these incompatible character encoding files. So please try ingesting these collections again.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tulibraries/dplah/issues/134#issuecomment-287858455, or mute the thread https://github.com/notifications/unsubscribe-auth/AQOsq9DVutsKMDAlvMg2k5-E4ruyQ_dJks5rnsibgaJpZM4J39Gv .
-- Doreva Belfiore
Digital Projects Librarian Co-Project Manager Digital Library Initiatives PA Digital Temple University Libraries www.padigital.org 215-204-4942 <(215)%20204-4942> (P) info@padigital.org 215-204-3681 <(215)%20204-3681> (F)
-- Doreva Belfiore
Digital Projects Librarian Co-Project Manager Digital Library Initiatives PA Digital Temple University Libraries www.padigital.org 215-204-4942 (P) info@padigital.org 215-204-3681 (F)
We will need to configure the aggregator to work with Islandora instances. I realize that not all Islandora instances are equal, so this may take longer. Currently we are thinking of potential work with 2 local partners, TBD. Please see me for specific details if you need.