hbz / digitalisiertedrucke

Implements http://digitalisiertedrucke.de/
0 stars 0 forks source link

Repair URLs for bigger collections #51

Closed acka47 closed 7 years ago

acka47 commented 7 years ago

URLs from some collections are broken. We probably can repair a lot of them systematically. Examples:

acka47 commented 7 years ago

For the big compact memory collection the chance don't look good.

Example http://beta.digitalisiertedrucke.de/resources/D34000: broken fulltext link: http://www.compactmemory.de/index_p.aspx?tzpid=12&ID_0=12&ID_1=215&ID_2=9446&ID_3=29342 Current workin fulltext link: http://sammlungen.ub.uni-frankfurt.de/2861615

ChristophEwertowski commented 7 years ago

Replaced the URLs where possible for the collections which have over 50 hits (shown after click on "Enthaltene Titel anzeigen"). It wasn't possible for documentation / project / searching pages / homepages etc. which don't have an equivalent anymore or URLs which don't have any description and aren't self-explanatory. Where is wasn't possible to replace the URLs I left them as they where. Because the edited file is bzipped again it isn't possible to do a git diff but I have a table on my local computer where I noted the changes in short form.

Two bigger collections have changed their system for referencing titles in the collection: The above mentioned compactmemory.de, now at the Digital Library of the Goethe University Frankfurt and literatur-des-judentums.de which is also at the Digital Library of the Goethe University Frankfurt (different link). Compactmemory had formerly combined ids (for example http://www.compactmemory.de/index_p.aspx?tzpid=68&ID_0=68&ID_1=982&ID_2=27356&ID_3=78917) and now only one completely new (for example http://sammlungen.ub.uni-frankfurt.de/cm/periodical/titleinfo/377570). It would be easier to drop the old datasets and getting new one than thinking about how to transfer the links to the bibliographic descriptions. We have OPAC-entries for the literatur-des-judentums.de but they don't refer to the HEBIS-Verbundkatalog (for example, http://www.literatur-des-judentums.de/opac/?ppn=013823973). We could, however, take the "HEBIS number" and replace the old URLs with URLs to the Verbundkatalog.

fsteeg commented 7 years ago

@ChristophEwertowski Perhaps it makes sense to attach the table as a CSV file here.

ChristophEwertowski commented 7 years ago

Since it doesn't belong in the repository I will add it as a .txt file in this comment. csv isn't permitted but whoever wants can simply change the file ending and edit it as a csv file. Because I wasn't thinking of publishing it, the notes to the changed URLs are in German. 51-ersetzte_Links.txt

acka47 commented 7 years ago

I think the improvements @ChristophEwertowski made are a good start. Please go ahead and commit your changes.

Regarding the two collections that have changed IDs we might replace the vroken link by a search with the document title, e.g. http://sammlungen.ub.uni-frankfurt.de/cm/search/quick?query=%C3%9Cber+das+Wort+avatiga for http://beta.digitalisiertedrucke.de/resources/D34000. It leads directly to the document – at least in this case...

ChristophEwertowski commented 7 years ago

As an addition to our offline discussion: The PPN or PICA production numbers aren't used for the frontend or in the link of the fulltext.

ChristophEwertowski commented 7 years ago

For the Compact Memory collection it's easier than for the the other collection because in Compact Memory there are only journals which mostly have different titles. Nevertheless a search for the title is better than nothing.

ChristophEwertowski commented 7 years ago

Functional review at http://test.digitalisiertedrucke.de/ necessary.

acka47 commented 7 years ago

The situation is much better than before. +1