internetarchive / epub

For code related to making ePub files
40 stars 3 forks source link

Epub creation takes too much memory #29

Closed mikemccabe closed 13 years ago

mikemccabe commented 13 years ago

http://www.archive.org/details/rpertoireprati07grio - epub creation fails with ulimit 1048576.

command line:

ulimit -v 1048576; PATH=/petabox/sw/bin:$PATH LD_LIBRARY_PATH=/petabox/sw/lib/kakadu python ../epub/convert_iabook.py --epub --document=rpertoireprati07grio rpertoireprati07grio /9/items/rpertoireprati07grio rpertoir.epub zsh: segmentation fault PATH=/petabox/sw/bin:$PATH LD_LIBRARY_PATH=/petabox/sw/lib/kakadu python

... but it succeeds with ulimit 2097152.

haase commented 13 years ago

Thanks, that's helpful. I'm wondering how to handle those cases for automatic generation.

-- Ken

mikemccabe commented 13 years ago

It turned out that I was failing to clear() the some pages returned by the lxml iterparse - critical, as they are VOLUMINOUS. These were only the pages that weren't included in the final book (skipped, due to no addToAccessFormats) - but that was enough to kick memory usage over the edge for some books.

This was tricky to debug, as only occasional pages leaked.