sustainability-zhaw / extraction-dspace

MIT License
0 stars 0 forks source link

Proper Department handling during import #9

Closed phish108 closed 1 year ago

phish108 commented 1 year ago

The departmental association is not handled. fixes #8

phish108 commented 1 year ago

Hey @sgluege nicht reviewen bevor ich fertig bin :P

sgluege commented 1 year ago

Hey @sgluege nicht reviewen bevor ich fertig bin :P

Woher weiss ich wann du fertig bist?

phish108 commented 1 year ago

Easy, wenn nicht mehr Draft oben dran steht :)

phish108 commented 1 year ago

@sgluege OK, ready for review.

This PR has two parts:

  1. the new get_deptcollection_from_xml_record_entity() extracts the collections from the record header and determines to which department a record is related. I use a crude lookup table. It would be more elegant do load the collection names via https://digitalcollection.zhaw.ch/oai/requets?verb=ListSets and load the department mapping from a separate configuration file that does not have to be public. But for the time being this should do the trick.

  2. I refactored the GraphQL code to use Variables. This simplifies the code significantly and allows to get rid of the mapping functions, because the data can be passed directly to the gql client as a dictionary.

I found a potential logical error in function get_entity_from_xml_record_entity(): You only test, whether the first element has some content and then assume that all elements have contents. This is not necessarily the case with XML documents. Do you think, we should refactor this code, too?