Closed peetucket closed 2 weeks ago
See https://argo-qa.stanford.edu/catalog?f%5Btag_ssim%5D%5B%5D=books+%3A+larger
Some of these objects have the split-xml step taking hours, which doesn't seem to make sense.
in particular https://argo-qa.stanford.edu/view/druid:xn427rt3998
xn427rt3998 has a single document XML file that is 142 MB
Running current code on my laptop does run slow. Investigating.
See https://argo-qa.stanford.edu/catalog?f%5Btag_ssim%5D%5B%5D=books+%3A+larger
Some of these objects have the split-xml step taking hours, which doesn't seem to make sense.
in particular https://argo-qa.stanford.edu/view/druid:xn427rt3998