Open jarmoza opened 11 months ago
It's possible that this should be extended to line extraction QA as well.
Some previous out of memory error books
slurmstepd: error: Detected 1 oom-kill event(s):
slurm-output-cole_R223278_DNLM_2_sureguide1665_6bff7bd3-cd29-4cf6-ba83-a710b75e7872.out
slurm-output-mclark_R31063_uklw_2_worksambroseparey1691_6bff7bd3-cd29-4cf6-ba83-a710b75e7872.out
slurm-output-jgrismond_R20542_NjPT_4_viewofgovernment1662_6bff7bd3-cd29-4cf6-ba83-a710b75e7872.out
slurm-output-mwhite_R8527_uk_2_grotiusthreebooks1682_6bff7bd3-cd29-4cf6-ba83-a710b75e7872.out
slurm-output-anon_R2930_iur_8_twotreatisesofgov1690_6bff7bd3-cd29-4cf6-ba83-a710b75e7872.out
By limiting the number of pages via config, we will likely eliminate any potential out of memory issues we were seeing in previous QA implementations.
Suggestion is a PAGES_PER_THREAD
variable in the config yaml for each QA module with a default/suggested limit of 50 pages.
Blocked until new line extraction method, eynollah, is integrated into QA line extraction code.
The current PnP pipeline only crops 50 pages at a time. The autocrop QA script should be adapted to fit this. This should address out of memory errors still be experienced.