Closed acka47 closed 10 years ago
the whole process over the testset should run in < 1h
I took 5k docs , transformation will result in 53k different subject URIs (most of them items).
Time consuming:
2 min
hbz01-resources:
hbz01-items:
Clearly the bottleneck is the enrichment which takes only place at hbz01-resources. May be worth a look , see lobid/lodmill#331 . For a workaround I propose to do without enrichment for this data test workflow - what do you think @acka47 @fsteeg ?
+1 for no enrichment in the testset for now. When we need it, we can add an enrichment testset.
+1 for starting without enrichment. We can't test the whole UI functionality like this, though.
Execution of one script (https://github.com/lobid/lodmill/blob/master/lodmill-ld/doc/scripts/processTestHbz01.sh) is enough to start transforming AND indexing. 5k hbz01 resource docs takes 5 m.
Closing.
Added an gnd test set. Takes just a few seconds more to build everything for test index.
Note: to don't possibly break a running transformation and because of beeing immediately executed the tests are done on another server not connected with the production hadoop cluster. The test hadoop cluster and everything that's needed for testing resides on gaia
.
~100k resources + items (including resources from api doc page & test files) + examples from github issues