Closed zh217 closed 3 months ago
There were no changes in the file format, but the number of checks and assertions grew. Apparently, one of those checks is hurting us here.
Does it also fail if you create an arbitrary index, and then call .load
- reinitializing it with a different file?
It fails with a different error:
RuntimeError: Key type doesn't match, consider rebuilding
triggered by the following code:
idx = usearch.index.Index(ndim=1024, metric='ip')
idx.load(idx_path)
which runs fine if downgraded to 2.9.2.
Interesting. Any chance the file was corrupted somewhere in between?
No. Here is a minimal example that you can test:
# run with usearch-2.9.2 installed
import usearch.index
idx = usearch.index.Index(ndim=1024, metric='ip')
idx.save('index')
# run with usearch-2.12.0 installed
import usearch.index
# will throw an error in usearch-2.12.0
idx = usearch.index.Index.restore('index', view=True)
There's no need to insert anything into the database in order to trigger the error. Should be that the metadata in the old version is messed up.
Update: this works both ways --- old version cannot open databases created by the new version either.
There were no changes in the file format, but the number of checks and assertions grew. Apparently, one of those checks is hurting us here.
In fact the file format changed due to a subtle change in code.
Compare:
with:
so different versions interpret enums in the metadata differently.
As the metadata stored on disk also has version information, we can make new version of the library open old databases by mapping the old values to the new values. There seems to be no easy fix for the reverse direction, however.
As this definitely breaks compatibility between versions (affecting all f16, f32, f64 indices and all languages), this should be marked as a breaking change.
We can localize the damage by changing what is returned by this function:
Since the result is returned in various places inside the function, maybe it is best to add a method on index_dense_metadata_result_t
to "upgrade" its version to the new enum by mutating its headers appropriately.
I can make a pull request for it if that's OK.
Good catch @zh217! I think a good solution would be a custom function to convert enum to integer and vice-versa, with respect to the file version. Can you add it in index_plugins?
@ashvardanian any update on introducing the backwards compatibility for pre-2.10 indexes in #438?
this would be valuable to me to avoiding recomputing all old indexes, but if that’s not expected to be introduced I will go ahead and do that
Working on it today.
Describe the bug
Newer version cannot open database created with older versions of the library.
Steps to reproduce
With the python client version 2.12.0:
results in
where the datafile was created with version 2.9.2 with:
Version 2.9.2 can open the datafile without problems.
On further testing, all versions from 2.10.0 onwards fail to open the database.
Expected behavior
Version 2.12.0 should be able to open database created with version 2.9.2, as the version numbers do not indicate any breaking changes.
USearch version
v2.12.0
Operating System
Ubuntu 22.04
Hardware architecture
x86
Which interface are you using?
Python bindings
Contact Details
No response
Are you open to being tagged as a contributor?
.git
history as a contributorIs there an existing issue for this?
Code of Conduct