Closed vassudanagunta closed 1 year ago
Hi @vassudanagunta, thanks for the kind words :)
In principle, if the pagePath
is uniquely identifying a document, you could follow approach 1. That said, approach 2 is also good, and differences in performance between the two approaches should be negligible.
As you noticed, internally, IDs are mapped to integers: this mapping is necessary to provide some features like the discard
method, and also to enable some optimizations. Therefore, ultimately it does not really matter what you use as an ID, as long as it is uniquely identifying a document.
@vassudanagunta I will go on and close the issue, but feel free to comment further if necessary.
I am indexing pages on a static website with about a thousand pages and 25k terms. Search results obviously need to include the page path. Which is likely to be more efficient?
pagePath
as the doc IDpagePath
as astoredField
I normally would think that
#2
might result in a more efficient internal index structure, but I noticed that you end up mapping an internal zero-based ID to the user-specified typeany
ID. Which makes me think#1
will be more efficient.I already said this in another issue, but I want to repeat it: