Closed siskahumlesjo closed 5 months ago
The main idea is that if we can stitch together those passages that are at the end of a page and start of a page, that are really just belonging to the same reuse but have been divided into to separate reuses by the algorithm, then we can clean away a lot of reuses.
Should we maybe run passim on whole books? Right now we run it on separate pages.
If possible!
I wonder if we still have access to the page numbers easily then... I will investigate
probably not easily, but I presume we would have to stitch together all the pages anyway and then we can perhaps add some tags that indicate original page...
Check end of the page if there is any reuses in next page or vice versa to make it clean and have access to whole reuesed text