oliver-oloughlin / kvdex

A high-level abstraction layer for Deno KV with zero third-party dependencies by default 🦕🗝️
https://jsr.io/@olli/kvdex
MIT License
192 stars 6 forks source link

duplicate entries with same primary index value #153

Closed waptik closed 9 months ago

waptik commented 9 months ago

I've been noticing this bug happening and thought it was my code but after looking at the keys are defined in both local development and deno deploy, i think that's the issue This is how the keys are defined locally(GUI using kv-insights) image image image

And this is how it's seen on deno deploy image image

As you can see, KvId is an array of numbers locally while it's a string on deno deploy. I'm not using history because i don't know it's usage.

Here's the function that deals with creating new users if they weren't identified by their primary key

import { NextFunction } from "grammy";
import { GrammyContext } from "$grammy/context.ts";
import { logging } from "$utils/logging.ts";
import { getProfileLink } from "$utils/grammy.ts";
import { sendMessageToGroup } from "$grammy/helpers/sendMessage.ts";
import { db } from "$db/mod.ts";

export default async function attachUser(
  ctx: GrammyContext,
  next: NextFunction,
) {
  if (!ctx.chat?.id || !ctx.from?.id) {
    return;
  }

  const dbUser = await db.users.findByPrimaryIndex(
    "telegram_user_id",
    ctx.from.id,
  );

  if (!dbUser) {
    const cr = await db.users.add({
      telegram_user_id: ctx.from.id,
      metadata: {
        summary_language: ctx.from.language_code,
      },
    });

    if (!cr.ok) {
      throw new Error(`Failed to create user id ${ctx.from.id}`);
    }

    if (!IS_DEV) {
      await sendMessageToGroup(
        `👋 New user #joined: ${getProfileLink(ctx.from)} [id${ctx.from.id}]`,
      );
    }

    const user = await db.users.findByPrimaryIndex(
      "telegram_user_id",
      ctx.from.id,
    ); // using this because `add` doesn't return `value`

    logging.info("attachUser >> user", user?.id);

    if (!user) {
      throw new Error("User not found");
    }

    ctx.user = {
      id: user.id,
      ...user.value,
    };
    ctx.session = {
      ...ctx.session,
      user: {
        id: user.id,
        isPro: user.value.is_pro,
        isUser: true,
        accepted: true,
      },
    };
    logging.info(
      "attachUser >> user info attached to ctx after adding to db",
      user.id,
    );
  } else {
    ctx.user = {
      id: dbUser.id,
      ...dbUser.value,
    };
    ctx.session = {
      ...ctx.session,
      user: {
        id: dbUser.id,
        isPro: dbUser.value.is_pro,
        isUser: true,
        accepted: true,
      },
    };
    logging.info("attachUser >> existing user info attached to ctx", dbUser.id);
  }

  return next();
}
oliver-oloughlin commented 9 months ago

The actual document id seems to be a ulid string both on deploy and locally, based on your screenshots.

When it comes to the primary index (and all indices for that matter), this is a little more complicated: Both locally and on deploy, the index value is serialized to a Uint8Array. I'm pretty sure your screenshots show that this is the case as well, as the Deploy KV view does not just display it as an array, but explicitly as a Uint8Array (check the HTML to see the entire key). So the last key entry should be a Uint8Array for both.

If you are having issues with the primary (or secondary) indices not matching between deploy and running locally, I may know why. Since the index values are serialized, it automatically chooses the best serialization strategy for the runtime that you are on. Since Deploy does not expose the internal Deno Core serialize methods, it defaults to the custom JSON serializer. Locally however, it uses Deno Core. This difference leads to different Uint8Arrays being created, which would not match. I agree that this is actually a problem. In the next patch, I will definitely make index-serialization default to the JSON serializer, and I'm considering removing the "auto" option for serialization altogether to make it very clear which serialization method is used.

thanks for bringing this issue to my attention.

waptik commented 9 months ago

Alright. Thanks for the clarification and yeah, it's really complicated from the looks of it. So i'm seriously looking forward to the next patch.

In the meantime, how do i solve this issue? I was about to delete the new _id__ by using db.users.delete when ctx.session.user.id and dbUser are available but don't have the same id but then doing it that way will retain the two entries of __index_primary__ and i can't delete by primary index as it will delete both. Existing data in the old is what's important to me(No issue for me in local development as i'm the user but i'm afraid of its likelihood to occur in production on deploy for whatever reason)

oliver-oloughlin commented 9 months ago

Just be explicit and use serialize: "json" for all collections. This should work across from the current version to oncoming versions, and ensure that serialization is done the same locally as on deploy.

waptik commented 9 months ago

Thanks, that did the trick. I also had to wipe out the database both locally and on deploy because i was getting Cannot read properties of undefined (reading 'map') in places used by db.collection.deleteMany and other unknown locations(no biggie).