I am looking at the JDG sample and there is one article (JDG-1878-05-12-a-i0020) where the text (starting with ‘FAITS DIVERS’) is repeated 3 times in a raw. On the S3 rebuilt version it is the same. I did not notice such thing for all other articles, so I am quite sure it comes from the original OCR and not from a step in canonical ingestion.
from @e-maud
to be checked to see what's going on