viking-sudo-rm / rusty-dawg

Rust library for indexing and quickly searching large pretraining corpora
https://arxiv.org/abs/2406.13069
MIT License
17 stars 2 forks source link

Refactor inference operations into `Infinigram` class #100

Open viking-sudo-rm opened 5 months ago

viking-sudo-rm commented 5 months ago

The Infinigram interface can be abstracted away from the CDAWG as a specific implementation. Operations at inference time should be moved to a different object/trait that wraps a Cdawg:

Most of these operations would take a CdawgState as input (and potentially the next token)