slub / mets-mods2tei

Convert bibliographic meta data in MODS format to TEI headers
Apache License 2.0
8 stars 7 forks source link

Realize an empty publication date if METS header is absent instead of failing with a Python error #46

Closed tboenig closed 2 years ago

tboenig commented 4 years ago

Hi @wrznr,

I use your program with data from sbb. Here an example: mm2tei -o "https://oai.sbb.berlin/oai/?verb=GetRecord&metadataPrefix=mets&identifier=oai:digital.staatsbibliothek-berlin.de:PPN66438790X" >test.tei.xml

A other example from sub goettingen mm2tei -o "https://gdz.sub.uni-goettingen.de/mets/PPN228873541.mets.xml" >test.tei.xml Here we find the same ssl problem.

Is the ssl problem a problem on ssb side or a problem in your program?

wrznr commented 4 years ago

Hi @tboenig, could you pls. post some kind of error message to make it easier to get an idea of the error?

tboenig commented 4 years ago

here the ssb error:

Traceback (most recent call last):
  File "/usr/lib/python3.6/urllib/request.py", line 1318, in do_open
    encode_chunked=req.has_header('Transfer-encoding'))
  File "/usr/lib/python3.6/http/client.py", line 1254, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1300, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1249, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1036, in _send_output
    self.send(msg)
  File "/usr/lib/python3.6/http/client.py", line 974, in send
    self.connect()
  File "/usr/lib/python3.6/http/client.py", line 1415, in connect
    server_hostname=server_hostname)
  File "/usr/lib/python3.6/ssl.py", line 407, in wrap_socket
    _context=self, _session=session)
  File "/usr/lib/python3.6/ssl.py", line 817, in __init__
    self.do_handshake()
  File "/usr/lib/python3.6/ssl.py", line 1077, in do_handshake
    self._sslobj.do_handshake()
  File "/usr/lib/python3.6/ssl.py", line 689, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "mets-mods2tei/env/lib/python3.6/site-packages/mets_mods2tei/scripts/mets_mods2tei.py", line 27, in cli
    f = urlopen(mets)
  File "/usr/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.6/urllib/request.py", line 526, in open
    response = self._open(req, data)
  File "/usr/lib/python3.6/urllib/request.py", line 544, in _open
    '_open', req)
  File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 1361, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "/usr/lib/python3.6/urllib/request.py", line 1320, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "mets-mods2tei/env/bin/mm2tei", line 8, in <module>
    sys.exit(cli())
  File "mets-mods2tei/env/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "mets-mods2tei/env/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "mets-mods2tei/env/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "mets-mods2tei/env/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "mets-mods2tei/env/lib/python3.6/site-packages/mets_mods2tei/scripts/mets_mods2tei.py", line 29, in cli
    f = open(mets, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'https://oai.sbb.berlin/oai/?verb=GetRecord&metadataPrefix=mets&identifier=oai:digital.staatsbibliothek-berlin.de:PPN66438790X'
tboenig commented 4 years ago

and here the sub goettingen error: sorry is not the same ssl error

Traceback (most recent call last):
  File "mets-mods2tei/env/bin/mm2tei", line 8, in <module>
    sys.exit(cli())
  File "mets-mods2tei/env/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "mets-mods2tei/env/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "mets-mods2tei/env/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "mets-mods2tei/env/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "mets-mods2tei/env/lib/python3.6/site-packages/mets_mods2tei/scripts/mets_mods2tei.py", line 35, in cli
    mets.fromfile(f)
  File "mets-mods2tei/env/lib/python3.6/site-packages/mets_mods2tei/api/mets.py", line 112, in fromfile
    self.__spur()
  File "mets-mods2tei/env/lib/python3.6/site-packages/mets_mods2tei/api/mets.py", line 233, in __spur
    self.encoding_date = header.get_CREATEDATE().isoformat()
AttributeError: 'NoneType' object has no attribute 'get_CREATEDATE'
wrznr commented 4 years ago

The former problem is most likely a problem at the host (SBB) or your own institution. Sorry.

The latter problem is caused by the missing metsHdr element in the METS file you want to process (cf. https://digital.slub-dresden.de/oai/?verb=GetRecord&metadataPrefix=mets&identifier=oai:de:slub-dresden:db:id-453779263). The METS file from Göttingen contains no information when it was created. But such information is mandatory for valid DTABf. If you have ideas on how to fix this, I will gladly implement them.

tboenig commented 4 years ago

Hi @wrznr,

If you have ideas how to fix it, I will be happy to implement them. my suggestion:

  • ignore the empty or missing metsHdr and make an empty <date type="publication"/> or an error message on cli, i.e. the mets file is not valid. I think a combination would be ideal.
bertsky commented 2 years ago

@tboenig I have difficulty implementing these fallbacks/error signals for missing headers, because I cannot find exact documentation of DTAbf and TEI proper.

For example, one of the dependent elements of metsHdr is the mets:agent, which is used for encodingDesc: https://github.com/slub/mets-mods2tei/blob/fc7b0f7cfb8a58e483bd355a7ae2eaaa7aebc6fe/mets_mods2tei/api/mets.py#L245

(I don't know why we throw away all but the first agent and all but its name, but granted.)

This information usually ends up in simple p elements: https://github.com/slub/mets-mods2tei/blob/fc7b0f7cfb8a58e483bd355a7ae2eaaa7aebc6fe/mets_mods2tei/api/tei.py#L471-L473

Now, according to DTAbf there is supposed to be an intermittent editorialDecl here. But the only reference I can find on that is in the (IIUC) Examples schema.

So what is the correct representation here, and what should I put in as a fallback in case the metsHdr is missing?