scaife-viewer / beyond-translation-site

Site used to iterate on translation alignments within the Scaife Viewer ecosystem
3 stars 4 forks source link

Document tokenization limits #155

Open jacobwegner opened 1 year ago

jacobwegner commented 1 year ago

Refs https://github.com/scaife-viewer/backend/blob/35f792914d04152cecce7426a061a9824ae5c45c/core/scaife_viewer/core/indexer.py#L140

New URNs means these will fail:

refs https://github.com/scaife-viewer/scaife-viewer/releases/tag/v2023-06-14-001

jacobwegner commented 1 year ago

I'm working on this for Brill too and will likely have a scaife-viewer-core update to address.

The limit appears to be around ~99,380 tokens.