kuzudb / kuzu

Embeddable property graph database management system built for query speed and scalability. Implements Cypher.
https://kuzudb.com/
MIT License
1.38k stars 97 forks source link

Extension version maintenance #3353

Open semihsalihoglu-uw opened 6 months ago

semihsalihoglu-uw commented 6 months ago

We need to think through how we will maintain extension versions and require users to reinstall them as they bump up or down their Kuzu versions. Specifically we need to make the following decisions:

  1. Extension version releases: We release our official extensions here: https://extension.kuzudb.com/. However, this repo contains some extension releases that are purely for testing purposes. Although users do not need to know about this repo or care, I think this does not look good.
  2. The extension version the binaries know about: When building from source, our CMakeLists.txt file contains a single line for the versions of the extensions to look: add_definitions(-DKUZU_EXTENSION_VERSION="0.2.6"). So our binary knows about a single "global extension version". An alternative to this could be to have "extension-specific-versions". That is we have different extension versions for different extensions. For example, httpfs could require v 0.1.0 while postgres scanner could require v 0.2.6. Both options have pros and cons. A single global extension version means that as users change their Kuzu version, they would be forced to reinstall every extension, which may be OK, even if the previous extension binary they had is the same binary. This also means that we rebuild and re-release every extension binary in each of our non-minor releases. Extension-specific-version would only require users to reinstall an extension only if it's strictly necessary and the previous extension version will not work with the new Kuzu version.
  3. Mechanism to actually validate that the right extension version is installed: More importantly, we currently do not have any mechanism to check if an extension version is valid or not for the running Kuzu version. The -DKUZU_EXTENSION_VERSION flag seems only to be used when installing an extension and not for when checking if an extension already exists. This I say because it looks like there is a single directory ${xyz}/extension under which we store all the extension binaries. So the directory into which we store extensions do not have version numbers in the directory name. Further the extension binaries do not have version numbers on them (e.g., regardless of the extension version number, all https extension binaries have the name libhttpfs.kuzu_extension. So we'll get unclear error messages like symbol not found when blindly dynamically linking against these. Chang tested this and can expand on this.

We need to make decisions on these points. We should do our research and look into what other systems are doing and try to make a more informed decision.

mewim commented 6 months ago

I looked into DuckDB a bit. They install extensions to ~/.duckdb/extensions/{DuckDB Version}/{System Architecture}/{Extension Name}.duckdb_extension.

When the user tries to load a extension, the released DuckDB binary looks for the extension based on {DuckDB Version} and {System Architecture}. If a corresponding one is found it can be loaded. When the user bumps their DuckDB version they always have to reinstall all their extensions.

mewim commented 6 months ago

I think we do need to keep track of the version of locally-installed extensions. Instead of installing to ~/.kuzu/extension/, we should install to ~/.kuzu/extension/{Extension Version}/{System Architecture}.

But I am not sure whether we should make the extension version the same as kuzu version or use the current approach of having a separate version string. The advantage of our current approach is that for minor releases, the users may not need to reinstall the extensions.

mewim commented 6 months ago

For extensions released for internal testings, let's make a rule of having a dev-x prefix. I'll periodically purge them (other than the latest one) from the repo.

mewim commented 6 months ago

Added extension version to local path in #3354

acquamarin commented 6 months ago

For extensions released for internal testings, let's make a rule of having a dev-x prefix. I'll periodically purge them (other than the latest one) from the repo.

I checked duckdb, they don't have repo to host extensions for dev build.

acquamarin commented 6 months ago

Mechanism to actually validate that the right extension version is installed: More importantly, we currently do not have any mechanism to check if an extension version is valid or not for the running Kuzu version. The -DKUZU_EXTENSION_VERSION flag seems only to be used when installing an extension and not for when checking if an extension already exists. This I say because it looks like there is a single directory ${xyz}/extension under which we store all the extension binaries. So the directory into which we store extensions do not have version numbers in the directory name. Further the extension binaries do not have version numbers on them (e.g., regardless of the extension version number, all https extension binaries have the name libhttpfs.kuzu_extension. So we'll get unclear error messages like symbol not found when blindly dynamically linking against these. Chang tested this and can expand on this.

I think this would be pretty hard. I don't think there is a mechanism that allows us to check the library version before actually loading the lib. @mewim Do you have any ideas on check the version of lib?

mewim commented 6 months ago
Mechanism to actually validate that the right extension version is installed: More importantly, we currently do not have any mechanism to check if an extension version is valid or not for the running Kuzu version. The -DKUZU_EXTENSION_VERSION flag seems only to be used when installing an extension and not for when checking if an extension already exists. This I say because it looks like there is a single directory ${xyz}/extension under which we store all the extension binaries. So the directory into which we store extensions do not have version numbers in the directory name. Further the extension binaries do not have version numbers on them (e.g., regardless of the extension version number, all https extension binaries have the name libhttpfs.kuzu_extension. So we'll get unclear error messages like symbol not found when blindly dynamically linking against these. Chang tested this and can expand on this.

I think this would be pretty hard. I don't think there is a mechanism that allows us to check the library version before actually loading the lib. @mewim Do you have any ideas on check the version of lib?

After changing path to ~/.kuzu/extension/{Extension Version}/{System Architecture}, kuzu should only load extensions that it can load and will throw an error for extension not found if it cannot find corresponding version. This behavior is the same as DuckDB.