Closed hastebrot closed 9 months ago
As a work-around I can use string
s instead with createdAt: new Date(doc.createdAt).toISOString(),
.
when I do a search and sort the ISO dates it will return the same result as with unixtime:
const result3 = await search(docs, {
sortBy: {
property: "createdAt",
order: "DESC",
},
limit: 100,
});
I believe I am the one to blame for this. I opted for implementing AVL trees to improve performance at search time, but slowing down index time too much.
I will take some time to test better alternatives (maybe red-black trees?) to see if we can also improve indexing time.
A practical, performant, and easy-to-implement alternative to AVL and RB trees are Zip Trees [1] by Robert Tarjan. Maybe they are useful.
[1] Tarjan, R. E., Levy, C., & Timmel, S. (2021). Zip trees, https://arxiv.org/abs/1806.06726
Thanks a lot, @hastebrot! I'll look into this next week.
Started working on it: https://github.com/oramasearch/orama/pull/430
With v1.1.1
, I see the same issue if one indexed document field is typed as number
.
I have about 80k documents. Each document is pretty small and has a popularity
field indicating a popularity score.
number
, the indexing time via insertMultiple
would be about 25s. string
padded with 0
on the left, the indexing time would just be about 3 to 4 seconds.As per my experiment, it looks like the issue is not related to how big the number is though.
As I intended to do indexing in browser, this issue was almost a deal-breaker for me until I found the workaround in this page. I really wish #430 would fix it.
I found out that what slows down indexing is the distance between numbers, in my tests 20k objects with timestamps spanning 10 years indexes properly whereas 2 years makes orama unusable. Another example is indexing autoincremented ids.
Hi @micheleriva, any progress on this? 🙏
We discovered an issue that slows the insertion when the numbers are ascendingly distributed. We fixed it, and the last release includes that fix. Could you try with the latest version?
@allevo works like a charm, thanks!
Perfect! Happy to hear!
I'm closing this issue; thanks for your patience.
I have 2000 documents with text that each are not larger than 65536 bytes. When I add just a single field with a number, e.g. using
createdAt: new Date(doc.createdAt).getTime(),
it slows downinsert()
considerably. interestingly, low numbers are fine.indexing of these documents with number: 14 seconds, without numbers: 3 seconds.