hbz / digitalisiertedrucke

Implements http://digitalisiertedrucke.de/
0 stars 0 forks source link

51 repair urls bigger collections #53

Closed ChristophEwertowski closed 7 years ago

ChristophEwertowski commented 7 years ago

Resolves #51

The changes are listed as a csv in the discussion of the issue.

acka47 commented 7 years ago

I just talked to @ChristophEwertowski. We obviously had a misunderstanding here. The purpose of this issue is to systematically repair links to the full text of individual resources.

For example all links of http://beta.digitalisiertedrucke.de/collections/digizeit.digizeitev.goe.de and http://beta.digitalisiertedrucke.de/collections/mathematica.sub.goe.de are broken. Examples:

As one can see, we can easily repair all the SUB resolver URLs by replacing the capitalized "PURL" by a non-capitalized "purl". We should take a look at how we can fix this for other bigger collections and then adjust the source data accordingly.

ChristophEwertowski commented 7 years ago

Now full text links are corrected where possible.

Most collections do have resources which don't have either a details view or a full text link and lead to error pages. These of course couldn't be improved (for example http://beta.digitalisiertedrucke.de/?q=isPartOf%3Afaustsammlung.haab.we.de). I expanded the table and upload them for later reference (51-ersetzte_Links2.txt). I removed the collection which only tell in their description that they consist of more than 50 resources but don't have links to full texts. All in all, for 17 of 26 collections the links to the full texts now mostly work. The other collections now use new Ids which can't be derived from the former Ids.

acka47 commented 7 years ago

@fsteeg Could you deploy this please so that I can test the changes?

fsteeg commented 7 years ago

When attempting to transform the new data, I get an:

org.xml.sax.SAXParseException; lineNumber: 3164979; columnNumber: 93; The reference to entity "projekt" must end with the ';' delimiter.

@ChristophEwertowski: Did you set this up and tested it locally (as documented in the https://github.com/hbz/digitalisiertedrucke README)?

ChristophEwertowski commented 7 years ago

If I transform it with my not functioning Play application (see https://github.com/hbz/digitalisiertedrucke/issues/54), I get the same message as fsteeg but with the addition that it's a metafacture exception:

org.culturegraph.mf.exceptions.MetafactureException: org.xml.sax.SAXParseException; lineNumber: 3164979; columnNumber: 93; The reference to entity "projekt" must end with the ';' delimiter.

The line is the following: <marc:subfield code="u">https://www.digitale-sammlungen.de/index.html?c=sammlung&projekt=0000000000&l=de</marc:subfield> (There are four blank spaces ahead, so it fits the column 93.) Because it seems to be a metafacture issue, I assign it to @dr0i .

ChristophEwertowski commented 7 years ago

Didn't escaped "&". Reassigning it to myself (thanks fsteeg!).

ChristophEwertowski commented 7 years ago

Ampersands are corrected and transformation of the data works. Let it run locally and searched for http://haab-digital.klassik-stiftung.de to find full text links with an ampersand and http://resolver.sub.uni-goettingen.de for results without. @fsteeg Can you deploy it to the staging system so that a functional review can be done?

fsteeg commented 7 years ago

Deployed to http://test.digitalisiertedrucke.de