sul-dlss / common-accessioning

Suite of robots that handle the tasks of accessioning digital objects
Other
2 stars 1 forks source link

Figure out why split-xml takes a long time for large objects #1293

Closed peetucket closed 2 weeks ago

peetucket commented 2 weeks ago

See https://argo-qa.stanford.edu/catalog?f%5Btag_ssim%5D%5B%5D=books+%3A+larger

Some of these objects have the split-xml step taking hours, which doesn't seem to make sense.

in particular https://argo-qa.stanford.edu/view/druid:xn427rt3998

peetucket commented 2 weeks ago

xn427rt3998 has a single document XML file that is 142 MB

peetucket commented 2 weeks ago

Running current code on my laptop does run slow. Investigating.