pulibrary / pudl-migration

A repository for managing the migration of remaining PUDL collections
1 stars 0 forks source link

Redirect PUDL URLs #76

Closed escowles closed 2 years ago

escowles commented 3 years ago

Patterns to redirect:

Hosts to redirect:

cwulfman commented 3 years ago

There are 22,208 arks pointing to http://pudl.princeton.edu/* . Each of those web pages must be scraped for metadata that can be used to determine what Figgy object it corresponds with.

escowles commented 3 years ago

@cwulfman Scraping the PUDL pages seems like the most exhaustive way to tackle this — but wouldn't looking in the Figgy database or PUDL METS files be easier (and probably faster)?

cwulfman commented 3 years ago

I don't think either Figgy or the PUDL METS have the necessary metadata -- I may be wrong. The object URIs look like http://pudl.princeton.edu/objects/00000017b. The final segment (00000017b) is sometimes, but not always, a METS OBJID, so that's probably the right place to start -- you're right. Triage is the name of this game.

cwulfman commented 3 years ago

New Strategy

Based on meeting btw @escowles and @cwulfman, we will pursue an agile strategy: in the first pass, simply direct all the items in a PUDL collection to its corresponding DPUL collection or finding aid (item -> collection); a second pass can redirect item -> item. This allows us to sunset the PUDL platform more quickly (the top priority).

Begin with GNIB and Latin American Posters, which have the largest number of item-level arks in PUDL.

Discussion

There are, probably, PUDL collections and/or items that don't map directly to DPUL collections or finding aids. There are, probably, few of these, however.

kayiwa commented 3 years ago

heya @escowles and @cwulfman what is the FQDN to redirect

 /collections/pudl0125 (about 100 items with custom destinations)
 /objects/ng451j89v -> arks.princeton.edu/ark:/88435/[ARK id]
 /sheetreader.php?obj=6h440w05w -> arks.princeton.edu/ark:/88435/[ARK id]

it is not clear to me.

escowles commented 3 years ago

@kayiwa Both pudl.princeton.edu and pudltest.princeton.edu, so:

pudl.princeton.edu/objects/[ARK id] -> arks.princeton.edu/ark:/88435/[ARK id] pudl.princeton.edu/sheetreader.php?obj=[ARK id] -> arks.princeton.edu/ark:/88435/[ARK id] pudltest.princeton.edu/objects/[ARK id] -> arks.princeton.edu/ark:/88435/[ARK id] pudltest.princeton.edu/sheetreader.php?obj=[ARK id] -> arks.princeton.edu/ark:/88435/[ARK id]

escowles commented 2 years ago

@kayiwa We have updated the spreadsheet with a few patterns and a list of about 60 literal URLs to be redirected: https://docs.google.com/spreadsheets/d/124DRio1JZU45wec3dLDt4z1a1rB-gYcPiXAMOP3IVFA/edit#gid=0

escowles commented 2 years ago

This was done, closing this issue.