Closed juuu-jiii closed 10 months ago
Great PR. Thanks for working on this issue. There are a few things I think might need to be modified.
I notice in one of your tests the index initialized in the init fork is non-empty. This appears to be because it's initialized from an external index file. Since the underlying table is unlogged in the event that the database crashes this will result in the table being empty but the index on it remaining populated which is not the expected behavior. More generally ambuildempty
should probably always build an empty index.
It's probably worth factoring out the code to build an empty index instead of calling BuildIndex
. BuildIndex
is doing work that isn't strictly necessary e.g. initializing a usearch index, plus modifying the function to support this code path adds some complexity. At a minimum I would probably call StoreExternalIndex
instead, though you can probably just copy some of the code from this since it also requires constructing a usearch index. Thinking more about this you may also need to initialize a blockmapgroup, which isn't happening right now.
It would also be nice to test that when the database crashes, the index in the init fork is loaded and can have vectors inserted into it. There isn't a great way to induce this within postgres at the moment. Maybe adding a function that does what pg_terminate_backend
does but sends a SIGKILL
instead of a SIGTERM
. This function should probably not get compiled into release builds since crashing the backend is very much not desirable behavior. It may not be worth doing all this, but it would be a good sanity check to at least kill -9
the backend and make sure things work as expected.
Finally, it seems like we shouldn't be writing to the WAL if an index's underlying table in unlogged. I'm not sure what would happen in a crash. I think the fork would just get overwritten with the init fork so it wouldn't cause issues, but since it's not expected it's probably best to avoid. It would also marginally improve performance for this use case which is nice
@ezra-varady Thank you for the feedback! I'll factor the unlogged index code out of BuildIndex
and place it into its own function, and I'll also look into testing index loading after database crashes.
Re:WAL, nothing is saved there if we are dealing with an unlogged table. Postgres performs checks under the hood to see whether the table in question is unlogged, and if it is, it skips xlog-related logic even if a WAL command is issued. I discovered this when looking at documentation as well as how other implementations of the aminsert
index access method within Postgres code deal with unlogged tables.
Addressed by #253
This PR adds support for unlogged tables, ensuring that Lantern functionality works correctly even if the current table is not saved to WAL. A corresponding test (and associated util testing scripts) have been added to make test to verify that unlogged table functionality behaves as expected.
NOTE: Currently the test checks inserts and index creation on an empty unlogged table as well as one retrieved from file. It also verifies that the created index works and that distance functions work on unlogged tables. Let me know if there are other cases that should be covered, and I'll be happy to add them!