Open mariusa opened 3 years ago
Hello! Apologies for the late answer.
Tantivy does not have any notion of primary key but you can add such a field and enforce the unicity on the application side. Concretely that means always deleting your primary key term before adding a new document. It is not cheap.
The API is called delete_term.
Common crawl is a pre-crawled dataset. If you have a business need for an index over common crawl, i'd be happy to discuss it.
Hi, the getting started with cli doc relies on already having wikipedia pages crawled in the right format. To crawl other sites, what crawler do you recommend? I've found this, but not sure how to use it: https://github.com/tantivy-search/tantivy-ccrawl
Also, when indexing a .json file (assuming data is stored in multiple json files), does does tantivy know when to
Thanks