feat: Session migrations

dcdunkan commented 2 years ago

Currently in sessions, adding or removing a session data property to the bot (initial) will not modify the existing session data of the chats. It would be cool if sessions could update the stored data to a new version by just modifying the initial data structure.

EdJoPaTo commented 2 years ago

Not sure if that's something easy to do as it could end up being error prone.

Assume something like that:

initial: {
  event: "old name";
}

Now you update your bot and have something like this:

initial: {
  event: {name: "new name"; location: "bat cave"};
  people: 42;
}

When merging the initial data with the old data its not fully migrated to the new data. people does not exist yet so it can be added and event will probably override the initial data resulting in a string while the new initial data is an object.

console.log(ctx.session)
// {
//  event: "old name";
//  people: 42;
// }

This might be horrifying as the TypeScript typings will state that the actual data is different than it actually is and creates runtime errors. So there has to be some logic involved and it has to be user provided.

My approach to this problem looks like this (but specific to the given bot):

bot.use(session(…));

// Migrate old data
bot.use((ctx, next) => {

  // Add now required data
  if (typeof ctx.session.people !== 'number') {
    ctx.session.people = 42;
  }

  // Old structure on existing key  
  if (typeof ctx.session.events[0] === 'string') {
    ctx.session.events = ctx.session.events.map(o => …)
  }

  // Old key that isnt used anymore
  delete (ctx.session as any).whatever;

  return next();
});

// Rest of the bot…

I am doing this since before grammY and it looks like a horrible mess sometimes… but it works! :grimacing: (Noted: Dont look at my old code, it looks like a mess to my current me…)

KnorpelSenf commented 2 years ago

I am in support of this.

There are two approaches that come to mind.

The first one would be to let the user define schemata for their data, and to validate the session data against it, dynamically coercing the session as necessary. I think this a pretty bad idea, because it's horribly complicated to implement on our side, error-prone even when done right, and a mess to use.

The second one would be to make the user provide migration functions and version numbers. This is a much cleaner approach. It's not hard to do for us, and it should come with reasonable complexity for library consumers. It could work like this:

Current code base, no migrations

Simply define the data, and use it in a session.

function initial(): SessionData {
  return { pizzaCount: 0 }
}
bot.use(session({ initial }))

grammY will store the data exactly as defined in the sesison.

Upgraded code base after starting to use migrations

You can define the initial data for newly created session, and a migration function that transforms each session in the development lifetime of the bot to the its successor definition. Here is an example with a single migration.

function initial(): SessionData1 {
  return { favoriteDish: { name: 'pizza', count: 0 } };
}

function migration(old: SessionData): SessionData1 {
  // implement migration logic from previous to current session
  return { favoriteDish: { name: 'pizza', count: old.pizzaCount } }
}
const migrations = {
  1: migration
}

bot.use(session({ initial, migrations }))

grammY will now store the session version alongside each session data, so for session data T grammY will store something along the lines of { __v: number, data: T }. If no version is stored in the session, we assume that no migration has been performed, and start executing the chain of migrations from the beginning.

Adding another migration

This is simple, you just define a new version number and pass it along. grammY will take care of the rest.

function initial(): SessionData3 {
  return { favoriteDish: { name: { language: 'en', value: 'pizza' } } }
}

function migration1(old: SessionData): SessionData1 {
  // implement migration logic from previous to current session
  return { favoriteDish: { name: 'pizza', count: old.pizzaCount } }
}
function migration2(old: SessionData1): SessionData2 {
  // implement migration logic from previous to current session
  return { favoriteDish: { name: { language: 'en', value: old.favoriteDish.name }, count: old.favoriteDish.count } }
}
function migration2(old: SessionData2): SessionData3 {
  // implement migration logic from previous to current session
  delete old.favoriteDish.count
  return old
}
const migrations = {
  1: migration1,
  2: migration2,
  3: migration3,
}

bot.use(session({ initial, migrations }))

After loading session data, grammY will check the version stored next to it. If it is not the most recent version, it will execute the chain of migrations, and then proceed from there.

KnorpelSenf commented 2 years ago

When using migrations, the session plugin suddenly has the ability to store its own data in the session, alongside the user data, without risking any clashes. This is very useful. From then on, the session should automatically handle migrations to supergroups. It can store the data behind the new key, and store a pointer in the old session, referencing the new key. As a result, we can use a unified session for both the old and the new chat.

KnorpelSenf commented 2 years ago

Regarding the chat migrations, I have good and bad news.

The bad news are that updates from the old and the new chat are sent concurrently, so we would have race conditions in the session plugin due to concurrent data access. This can lead to data loss.

The good news is that I talked to the Bot API developer and he said that the Bot API server should handle this complexity and send updates consecutively in the future. I expect this to be fixed in the coming days and weeks.

Note that messages may still be received out of order, as there is no natural order implied on updates from unrelated chats, but this should generally not be an issue, since messages can be received in different order anyway.

KnorpelSenf commented 2 years ago

To counter the good news, the plan of adding this to the Bot API server was dropped. It is in fact not possible to implement this. For the exact some reason, it will not be possible to implement this in grammY. Unfortunately, if bot devs care about this, they will have to deal with it themselves. Therefore, the scope of this issue will be limited to the original description again.

KnorpelSenf commented 2 years ago

I was just thinking about the right abstraction for this and it looks like we could actually do this outside of the core library in a storage adapter that wraps another storage adapter.

bot.use(session({ initial, storage: migrationAdapter(freeStorage(bot.token), migrations) }))

Basically, migrationAdapter does the aforementioned transformations upon read, but otherwise delegates all IO to the supplied adapter.

Do you think that this is a good idea? It looks a bit harder to use than native support. The only reason to do this is to keep code out of the core lib.

It looks a bit like we are now starting to have some sort of session plugins. I am unsure if that's a good thing, as people already need to decide between lots of plugins and storage adapters, and starting to write plugins for storage adapters will contribute to more mess.

@Satont I'm especially interested in your opinion here.

Satont commented 2 years ago

In fact, I noticed a long time ago that we have too much stuff outside the main library. It really creates chaos. Here's an example: we have 3 or more ways to ask for user input. How does the user decide which one to use?

But if we were to take it out as you suggest, I would also take the sessions themselves out of the main library. So that the sessions and the plugin for migrations would be in one plugin.

In general, the adapter above the adapters looks okay so you don't have to think about the implementation inside the db adapters.

KnorpelSenf commented 2 years ago

So that the sessions and the plugin for migrations would be in one plugin.

I was trying to find a way to extract the migrations, but this convinced me. The migrations are a part of the sessions plugin, so it makes much more sense to keep things together.

In general, the adapter above the adapters looks okay so you don't have to think about the implementation inside the db adapters.

Great. I also noticed that the original suggestion does not work well. Adding migrations means that the storage adapters will have to store meta-data such as a version. However, if users pick primitive values as their session data, we have no way of storing this data. In other words, using migrations will affect which data the storage adapter will have to store, so the best solution would probably indeed be to wrap the storage adapter.

Given the opportunity to store custom data with each session, we could also add TTL (the same way how the memory sessions do it, but for arbitrary storage adapters). This means that we will end up with a function that takes a storage adapter, and enriches it with at least two other features (migrations and timeouts). Do you have any suggestions for a name?

KnorpelSenf commented 2 years ago

This will be shipped as an enhanceStorage function from the session plugin.

came commented 1 year ago

Is there a reason the migration functions are synchronous only? I was trying to have an async migration function and it fails. :/

Referring to this code here https://github.com/grammyjs/grammY/blob/449fc55081e6352aa3179a69cc1dae799f59cbd9/src/convenience/session.ts#L607

BR, Carsten

KnorpelSenf commented 1 year ago

What would be your use case to perform async operations inside migration functions? I'm certainly open to permit them in the future

came commented 1 year ago

I thought about using migration to update remote config data in the session. Probably not the best idea. Now I save a json file with the config settings.

KnorpelSenf commented 1 year ago

I'm not sure I can follow. Inside a migration, you want to read some config data from a JSON file and inject it into the session, correct?

grammyjs / grammY