techfort / LokiJS

javascript embeddable / in-memory database
http:/techfort.github.io/LokiJS
MIT License
6.73k stars 482 forks source link

Corrupted idIndex and $loki value #881

Closed heartmon closed 2 years ago

heartmon commented 3 years ago

Hello

Sometimes it happens which make the update to fail with Trying to update a document not in collection

image

Has anyone had this issue before? How could I solve this?

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

1nstinct commented 2 years ago

@heartmon have you solved the issue? It's been a lot of unresolved thread with the same issue but nobody is going to solve it.

Miky88 commented 1 year ago

I'm having the same issue here and I got 88 duplicates for no reason... and it happened suddently. May this issue be looked into?

Losses commented 1 year ago

@1nstinct @Miky88 This is because the binary search method provided by Loki assumed that the $loki index won't duplicate, replace the code of the binary search searching w/

      while (min <= max) {
        const mid = (min + max) >> 1;

        if (data[mid] >= id) {
            max = mid - 1;
        } else {
            min = mid + 1;
        }
      }

      if (min < data.length && data[min] == id) {
        if (returnPosition) {
          return [this.data[min], min];
        }
        return this.data[min];
      }

will solve the problem.

Miky88 commented 1 year ago

It'd be useful to fix or prevent the duplicate creation more than forcing the search to take one of the results that may not be the right one...

Losses commented 1 year ago

It'd be useful to fix or prevent the duplicate creation more than forcing the search to take one of the results that may not be the right one...

Yes, you are right, but without this hack we can't even fix the database, which is already corrupted, since the remove method also rely on this search algorithm.

Losses commented 1 year ago

Ok, let's just, at least, build a function to fix the corrupted database:

const fixCollection = (collection: Loki.Collection) => {
  const deduplicateSet = new Set();
  const data = collection.data.filter((x) => {
    const duplicated = deduplicateSet.has(x.$loki);
    deduplicateSet.add(x.$loki);

    if (duplicated) {
      console.warn('Detected duplicated key, will remove it');
    }
    return !duplicated;
  })
  .sort((a, b) => a.$loki - b.$loki);

  const index = new Array(data.length);
  for (let i = 0; i < data.length; i += 1) {
    index[i] = data[i].$loki;
  }

  collection.idIndex = index;
}
Miky88 commented 1 year ago

Yo that's a good compromise, but still the different documents can have different values and you know what'd be the best? Checking the latest updated or first created and assuming it's the valid one imo

Losses commented 1 year ago

That's a good idea, let me have a look!

Losses commented 1 year ago

Updated, I still have no clue what caused the duplicated data, but one possible reason is the max id is not correctly calculated, and, here's an updated implementation to fix the bad maxId

const fixCollection = (collection: Loki.Collection) => {
  const deduplicateSet = new Set();
  const data = collection.data
    .sort((a, b) => a.meta.created - b.meta.created)
    .filter((x) => {
      const duplicated = deduplicateSet.has(x.$loki);
      deduplicateSet.add(x.$loki);

      if (duplicated) {
        console.warn('Detected duplicated key, will remove it');
      }
      return !duplicated;
    })
    .sort((a, b) => a.$loki - b.$loki);

  const index = new Array(data.length);
  for (let i = 0; i < data.length; i += 1) {
    index[i] = data[i].$loki;
  }

  collection.data = data;
  collection.idIndex = index;
  collection.maxId = collection.data?.length
    ? Math.max(...collection.data.map((x) => x.$loki))
    : 0;
  collection.dirty = true;
  collection.checkAllIndexes({
    randomSampling: true,
    repair: true,
  });
}

Hope this will help you, I'll continue investigate the issue

Miky88 commented 1 year ago

That would be a great pull request but something tells me that it will be ignored as the other ones...

Losses commented 1 year ago

That would be a great pull request but something tells me that it will be ignored as the other ones...

I spent the entire nine days of vacation doing a systematic refactoring of the Loki source code, but the unit tests haven't been sorted out yet, so maybe, I can do these fixes on my own Fork later

Miky88 commented 1 year ago

I'll never end to TYSM for your help this very helpful method worked so fine you saved so much time.