ivoleitao / stash

Key-value store abstraction with plain and cache driven semantics and a pluggable backend architecture.
MIT License
82 stars 16 forks source link

Deleting a cache entry requires it to be parseable #43

Closed tjarvstrand closed 1 year ago

tjarvstrand commented 1 year ago

Describe the bug Currently, when deleting a cache entry stash's GenericCache will try to read the entry as part of the deletion, which will try to decode its contents. If the decoding fails, the deletion will fail.

An important aspect of a local cache is to be able to treat it as ephemeral data. Ie. if your schema evolves, you just delete invalid entries and repopulate them. Currently, this is not possible with stash when using custom decoders.

To Reproduce

  1. Create a cache using stash_hive.
  2. Add an entry to the cache that is not decodable from JSON.
  3. Try to delete the entry.

Expected behavior The entry is deleted

Version stash 4.5.1

ivoleitao commented 1 year ago

Thanks for the report. This makes sense to me. I need to make sure I only do this in case of deserialization error not for other types of errors. I will take a look in the weekend and hopefully came up with a solution

tjarvstrand commented 1 year ago

Just had a quick look and the situation looks to be the same for put. Ie. there's no way to overwrite an entry that is not decodable either.

ivoleitao commented 1 year ago

Just had a quick look and the situation seems looks to be the same for put. Ie. there's no way to overwrite an entry that is not decodable either.

Yes, true I need to do this in other places

ivoleitao commented 1 year ago

Hi,

I did a couple of small prototypes to try to solve this problem, and decided to address this in a completely different way which unfortunately will take much more time and will be part of ongoing work which will trigger a breaking change.

My big problem with a more straight to the point fix is that I will be hiding errors that shouldn't be abstracted. One very simple way of simulating your issue is the following. Imagine that I set a Task object in the cache something like this:

class Task {
  final int id;
  final String title;
  final bool completed;

  Task({required this.id, required this.title, this.completed = false});

  /// Creates a [Task] from json map
  factory Task.fromJson(Map<String, dynamic> json) => Task(
      id: json['id'] as int,
      title: json['title'] as String,
      completed: json['completed'] as bool);

  /// Creates a json map from a [Task]
  Map<String, dynamic> toJson() =>
      <String, dynamic>{'id': id, 'title': title, 'completed': completed};

  @override
  String toString() {
    return 'Task $id, "$title" is ${completed ? "completed" : "not completed"}';
  }
}

Now imagine that I change the fromJson to:

  factory Task.fromJson(Map<String, dynamic> json) => Task(
      id: json['id'] as int,
      title: json['salutation'] as String,
      completed: json['completed'] as bool);

There's no salutation just a title, this simulates more or less a class of problems that you were highlighting. My problem with this is that I would need to hide a null reference to absorve the error and let you do a put.

I think the correct solution for all of this is to support migrations, something that I had already in mind but which needs the stash entries to support one more field, the version which needs to be serialised. I would then provide a way to set the version of the cache / vault and the opportunity to configure migration functions that will be triggered when the version of the retrieved stash entry is lower than the current version.

This needs a breaking change but it's probably a much more elegant way to address your problem. I will make this part of version 5.0 but I can't get you a precise release time.

I will keep this open as I think this should be "fixed" even if with a different strategy

tjarvstrand commented 1 year ago

Thanks for looking into this!

To be honest though, I think that solution overly complicates the problem. Caches are all about fast and simple, and migrations are anything but simple. Personally, I use caches for data that I don't really care much about. If I need robust data management, I use something better suited for that.

If you ask me, the problem is that the entry metadata can't be read without deserializing+decoding the entire entry. This, in addition to being a problem in this case, is also a pretty big performance limitation for heavy use.

ivoleitao commented 1 year ago

If you ask me, the problem is that the entry metadata can't be read without deserializing+decoding the entire entry. This, in addition to being a problem in this case, is also a pretty big performance limitation for heavy use.

This is true, I agree. It's also something that I need to look into as I didn't approach this issue like that. For some reason I'm reading the entire entry but I don't remember why I ended up doing that I need to look into it. I believe I tried to change this in the past but ended up not doing so, not sure if for lack of time or something else. I'll do another round on it to see what I can do.

There's something that you need to be aware though, it depends on the backing store if i can extract only the envelope of the cache entry or not. From what I remember there's only two backing stores that allow that level of optimization, stash_file and stash_sqlite as I have full control on those cases. In the case of hive I need to extract the full entry from the hive store. However it is also true that for some operations (put is an example) I don't need to deserialise the whole thing, more importantly I don't need to call the custom fromEncodable

I'll try to look into this from that perspective and we'll keep you posted on it

ivoleitao commented 1 year ago

Ok, I've managed to fix it at the expense of a minor feature that I think no one was using (I hope). Stash supports a number of events:

The following:

were providing access to the "old" entry, however now that we had this discussion I think that reading the full entry just because of this is probably an unacceptable tradeoff in performance so i decided to drop the support for it and provide only access to the (Cache/Vault)Info of the old entry. This was not the full solution though since as mentioned above the storages were still reading and decoding the full thing. On the cases were that was happening (not in case of stash_file and stash_sqlite for example) i've changed the approach.

So with all this said, the new version is published, please give it a try and if you find any error either re-open this ticket or create a new one.

Thank you

tjarvstrand commented 1 year ago

Awesome! Thanks for looking into this so quickly!

ivoleitao commented 1 year ago

No prob 👍