snaekobbi / issues

Common issue tracker for the Braille in DAISY Pipeline 2 project
0 stars 0 forks source link

Improve performance #28

Closed josteinaj closed 7 years ago

josteinaj commented 8 years ago

There didn't seem to be an issue for this already so I'm creating this one.

I ran some tests yesterday using the docker image I created (https://github.com/snaekobbi/system/commit/2413f9bf1d40f6469c6ddee80a89bf12e7ec03a3 - meaning engine version 1.9.10-20160309.135502-7 and mod-nlb version 1.6.0-SNAPSHOT - I used the nlb:dtbook-to-pef script) on randomly selected books of various sized from NLBs book archive. All books were converted successfully, but conversion time varied from 7 seconds to 2 hours and 44 minutes. While there seems to be a correlation between job duration and the size of the book for durations up to maybe 10 minutes, it doesn't seem to explain why some of the jobs took several hours. I've attached a spreadsheet (CSV) with the results in case it is of interest: log.csv.txt.

If there's a way to get more debugging output in order to determine what parts of the script are taking up so much time for some books then I can probably run more tests.

josteinaj commented 7 years ago

So I finally got around to (and managed to run) speed tests comparing the Java implementation and XSLT implementation of css:shift-string-set (see nlbdev branches speed-test.java and speed-test.xslt).

It turns out that the Java implementation is about 20 times faster. For a 955K book, the XSLT implementation spends 15,822 seconds, and the Java implementation spends on average 0,717 seconds. I haven't measured the impact on bigger/smaller books, but I think it's safe to say that we should continue using the Java implementation, and also try implementing similar XSLTs in Java.

josteinaj commented 7 years ago

I think we can say that this issue is replaced by https://github.com/daisy/pipeline-tasks/issues/71