khonsulabs / bonsaidb

A developer-friendly document database that grows with you, written in Rust
https://bonsaidb.io/
Apache License 2.0
1.01k stars 37 forks source link

View is never computed #256

Closed FlixCoder closed 2 years ago

FlixCoder commented 2 years ago

Hi! I found another thing, this time a bug:

I am starting an example with an empty database and regularly poll a view on my collection. Initially it is empty of course, so no need to compute the view.

However, when I add just a single entry, it stochastically sometimes updates the view, sometimes it doesn't (it doesn't call the map function). This is not a problem when adding many entries, because it updates often enough then. But when adding just a single entry, this makes the view be empty forever, even if there should be an entry (again, map is never called, despite an entry being added to the collection).

I hope this was understandable, please tell me whether you need any more information.

Thank you again! :)

ecton commented 2 years ago

Views are lazy by default, which means until a query is executed, the map function will not be called. If a view is marked as Unique, this no longer applies and the view is updated during the transaction.

In changes that haven't been released in main, views can be marked as "eager" to force them to be updated always without requiring the unique constraint.

Does this match your experience? If not, I'll want to see some code to try to understand what's different from your usage than what I've tested so far.

FlixCoder commented 2 years ago

I have tried making a minimalized example, but I failed to replicate what is happening here: https://github.com/FlixCoder/bonsaimq/blob/main/examples/simple.rs

I am very sure that indeed the mal function is never called, even when I query the view. Again, it is stochastic, so for the same code, sometimes it updates the view, sometimes it does not.

I will try again to find a smaller example than my whole project, but it might take some time until I find the time.

FlixCoder commented 2 years ago

But to save at least a little time of yours to searching around my code:

ecton commented 2 years ago

From looking through your code, it seems pretty straightforward, and I can't see how your code would be at fault. It seems like there's an edge case in detecting new changes.

I hesitate to ask you to narrow it down too much now because the entire view indexing system has been rewritten in #250 (not yet merged to main). I don't remember discovering any edge cases like this when doing the rewrite, but the new system reduces the amount of state needed to keep track of what has been and hasn't been indexed, and it improves MVCC guarantees.

Due to my refactor in #250 not solving the performance issues I was seeing, I'm not sure how quickly v0.5.0 will be released -- optimistically it might be the end of June.

I'm currently focusing on how to get BonsaiDb's performance back to where I want it to be, but I'll try to switch in the next few days to see about a fix for v0.4.

FlixCoder commented 2 years ago

Alright, thank you!

ecton commented 2 years ago

My mind couldn't let this puzzle go. I tracked it down to a bug I had already fixed but haven't released in Nebari: https://github.com/khonsulabs/nebari/commit/32691e510138e5490900925ece9344a0b134002e#diff-00154e62b20bc4b1c2638af3eb876665458523253415bf762974217d56e52c87

There was an edge case where one thread/task causes a mapping job to happen while a transaction is occurring on another thread/task. Nebari's implementation of current_transaction_id was just flat out wrong -- dating back to the early days of the library. Instead of behaving as documented, it would return the most recently allocated transaction id. The view indexer would not find any invalidated documents, and note the value of current_transaction_id before it looked for invalidated documents.

The next time the query happened, even though the transaction is now complete, the internal state of BonsaiDb thinks that the view is already on the current transaction, so it never calls the mapping function. When the next transaction completed, it allowed the mapper to run again which correctly indexes the documents that hadn't been indexed yet as well as the most recently changed ones.

I just released Nebari v0.5.4. Hopefully after you cargo update you will not be able to reproduce this issue anymore.

FlixCoder commented 2 years ago

It is fixed indeed, thank you! It was the transaction that was the difference :D
So everything seems to work now :)