I tried to setup to change to code so that the pica binary file is read instead of the sigil.xml
This PR does three things:
Uses a Pica-Dump instead of the old XML-Dump.
Splits the process for the piccadata into pica binary dump transformation and OAI-PMH Pica XML Update transformation
Updates the tests
This process seems to have some error, since I do not know when to close the streams, if at all.
Could you have a look and adjust the workflows since I have no Java skills.
I updated the testfile to a pica binary file with 6 entries, using the same ISILs that were used in the old metadata plus more. I updated the transformation-tests ~but the other play test still need some fixing~. Also the play tests are updated. (Edit: 10.08.23)
Old workflow (xml dump and lots of oai updates) needs 3 1/2 min to transform and index:
2023-08-08 15:43:05 +0200 [INFO] from application in
main - Starting transformation, will write to '/tmp/lobid-organisations/enriched.prod.out.json'
2023-08-08 15:46:39 +0200 [INFO] from play in
main - Application started (Prod)
2023-08-08 15:46:39 +0200 [INFO] from play in
main - Listening for HTTP on /0:0:0:0:0:0:0:0:9000
New process (pica binary dump + small number of oai updates) takes 1 min to transform and index:
2023-08-10 13:22:18 +0200 [INFO] from application in
main - Starting transformation, will write to '/tmp/lobid-organisations/enriched.prod.out.json'
2023-08-10 13:23:33 +0200 [INFO] from play in
main - Application started (Prod)
2023-08-10 13:23:34 +0200 [INFO] from play in
main - Listening for HTTP on /0:0:0:0:0:0:0:0:9000
Resolves #462
I tried to setup to change to code so that the pica binary file is read instead of the sigil.xml
This PR does three things:
This process seems to have some error, since I do not know when to close the streams, if at all.Could you have a look and adjust the workflows since I have no Java skills.I updated the testfile to a pica binary file with 6 entries, using the same ISILs that were used in the old metadata plus more. I updated the transformation-tests ~but the other play test still need some fixing~. Also the play tests are updated. (Edit: 10.08.23)
Old workflow (xml dump and lots of oai updates) needs 3 1/2 min to transform and index:
New process (pica binary dump + small number of oai updates) takes 1 min to transform and index: