khonsulabs / bonsaidb

A developer-friendly document database that grows with you, written in Rust
https://bonsaidb.io/
Apache License 2.0
1.01k stars 37 forks source link

Add support for renaming a collection or view #180

Open ecton opened 2 years ago

ecton commented 2 years ago

We need a way to allow renaming a collection from one name to another. This is one reason that the collection name and view names aren't automatically calculated from the type name.

ModProg commented 2 years ago

I had a bit more thinking about the whole thing.

On the naming, fields are also breaking changes to refactor IIRC so I wouldn't see it too problematic to use the struct name by default. So there should be a way to rename them, but I feel like it is not a definite problem to have default names. As long as well documented.

ModProg commented 2 years ago

Secondly on the whole migration thing, other than liquibases approach with external scripts, I would propose something in code.

Something like

#[derive(Collection)]
// Original definition (implicitly v=0)
#[collection(name="hello", authority="test")]
// First update
#[collection(v=1, name="test", authority="hello")]
// Second update
#[collection(v=2, name="nice")]
struct Hallo {
    #[collection(v=2, name="hello")]
    // Automaticly convert using `From`/`Into`
    #[collection(v=3, type="bool")]
    test: u8,
    // Convertion that could fail, using try_from, should hold original data for failure
    #[collection(v=4, type="String", try_from)]
    // Could technically also be Result<SomeNicerEnum, String>
    test2: Result<SomeNicerEnum, MigrationError<String>>,
    // With a custom function to try_from with, using the default on failure
    #[collection(v=4, type="String", try_from_or_default=FromStr::from_str)]
    test3: SomeNicerEnum2,
}

impl TryFrom<String> for SomeNicerEnum {
    // ...
}

impl Default for SomeNicerEnum2 {
    // ...
}
impl FromStr for SomeNicerEnum2 {
    // ...
}

This would obviously need some representation in the form of a trait.

Versions would need to be synced over an entire project, meaning that two structs can share a version change and e.g. swap names with each other.

There a probably thousands of edge cases to consider and this is just a UI mockup in the end.

(I would also need to improve attribute-derive to support such usage, and you could not split the attribute over multiple calls for collection anymore, as those are now reserved for migration history)

ModProg commented 2 years ago

One immediate flaw I see in my definition, is... what is the current status.

For the struct it appears to be v=2 but for the fields it is an unspecified v=5 that is converted to. This needs some more thought.

ecton commented 2 years ago

On the naming, fields are also breaking changes to refactor IIRC so I wouldn't see it too problematic to use the struct name by default.

This issue is specifically to address the ability to rename a collection without losing data. Currently if you return a new CollectionName from collection_name(), any existing data will still be on disk but will not be able to be found.

Field names changing aren't always a breaking change. For example, serde offers #[serde(rename)] and #[serde(alias)], both which allow simple edits without breaking changes. Logic to support migrating between major versions of stored data is being tracked in #26.

So there should be a way to rename them, but I feel like it is not a definite problem to have default names. As long as well documented.

The default naming problem is only a problem for Collections, as it can lead to unexpected data loss by simply renaming the Rust type and deploying your new code. If your unit tests only seed test databases and don't use an existing database, it will not catch this data-loss bug that renaming the Rust structure introduced.

By keeping collection_name() manually specified, the user will be able to refactor the Rust name without being force to update the internal schema name. This allows the user to decide to not actually rename the collection under the hood, or once this request is implemented, they could choose to rename it either manually or (after #26) during a migration.

I'm a little cautious about making one macro do too many things. I think we will want a field-level annotation macro someday as you described, but to me it's more about powering a query language and/or supporting multiple programming languages.

ModProg commented 2 years ago

Makes sense, didn't think about the serde features, but an actual versioned system at some point would make sense.

And people are probably already used to needing to add such annotations on field renames, but not on structs themselves. And even tho, I would expect things to break when I rename a struct without a specified name, others might miss it, and run into quite severe problems afterwards.

When you have a way how you want to do this, I would take a look how macros could be applied, but they do not need to be the initial way forward!