ivoleitao / stash

Key-value store abstraction with plain and cache driven semantics and a pluggable backend architecture.
MIT License
82 stars 16 forks source link

Reading nested JSON throws `TypeError` #45

Closed tjarvstrand closed 1 year ago

tjarvstrand commented 1 year ago

Describe the bug So, this one took me quite a bit of digging. It turns out that some backends (stash_hive and stash_isar at least, possibly others) are unable to decode nested JSON structures that where written by a different VM instance (as in, the values have left memory).

It seems that this is due to objects' runtime type of Map<String, dynamic> is lost and instead you get Map<dynamic, dynamic>. The top level is fine because stash casts this, but nested objects are not touched.

To Reproduce Here's a test scenario that can be used to recreate the behavior:

import 'package:json_annotation/json_annotation.dart';
import 'package:stash/stash_api.dart';
import 'package:stash_hive/stash_hive.dart';
import 'package:test/test.dart';

part 'events_cache_test.g.dart';

@JsonSerializable()
class User {
  User(this.id);

  factory User.fromJson(Map<String, dynamic> json) => _$UserFromJson(json);

  final String id;

  Map<String, dynamic> toJson() => _$UserToJson(this);
}

@JsonSerializable()
class Event {
  Event(this.id, this.owner);

  factory Event.fromJson(Map<String, dynamic> json) => _$EventFromJson(json);

  final String id;
  final User owner;

  Map<String, dynamic> toJson() => _$EventToJson(this);
}

Future<Cache<Event>> _cache() async {
  final store = await newHiveDefaultCacheStore();
  return store.cache<Event>(name: 'test', fromEncodable: Event.fromJson);
}

void main() {
  final event = Event('123', User('123'));

  test('Step 1, run this first, by itself', () async {
    final cache = await _cache();
    await cache.put('123', event);
    await cache.get('123');
  });

  test('Step 2, run this second, also separately', () async {
    final cache = await _cache();
    await cache.get('123');
  });
}

Expected behavior After writing the entry in step 1, it should be deserializable in step 2, but instead it fails with: type '_Map<dynamic, dynamic>' is not a subtype of type 'Map<String, dynamic>' in type cast

Version 4.5.2

Additional context I have been able to work around the issue by changing PersistenceStore.decodeValue to this:

dynamic decodeValue(
      dynamic value, dynamic Function(Map<String, dynamic>)? fromEncodable,
      {dynamic Function(dynamic)? mapFn, dynamic Function()? defaultFn}) {
    if (value == null) {
      return null;
    }

    if (fromEncodable != null) {
      return fromEncodable(_toEncodable(mapFn != null ? mapFn(value) : value));
    } else if (defaultFn != null) {
      return defaultFn();
    }

    return value;
  }

  dynamic _toEncodable(dynamic source) {
    if(source is Map) {
      return source.cast<String, dynamic>().map((key, value) => MapEntry(key, _toEncodable(value)));
    }
    if(source is Iterable) {
      return source.map(_toEncodable).toList();
    }
    return source;
  }

I'd open a pull request but I've haven't been able to write a testcase which properly sets up the preconditions for the current code to fail. I also have some doubts about the efficiency of the code (Especially since Dart is not tail-recursive :D )

I also just realized that this would probably affect storing of plain maps (without a fromEncodable) as well, meaning that return source; above will return a Map<dynamic, dynamic> when reading a cached value after a restart. So _toEncodable may have to be run on that as well.

tjarvstrand commented 1 year ago

Hm, could this be msgpack-related?

ivoleitao commented 1 year ago

Thanks for the very detailed report, I believe this is a msgpack error. The good news is that I was able to reproduce it so I hope this is an easy fix. Let me look at it

ivoleitao commented 1 year ago

Hi, just to give a bit of feedback on this, the fix for all storage backends that leverage msgpack is already done, as is for hive which is not using it but with the change I did it will start to use it.

This was quite a problem although for the backends supporting msgpack it was quite easy to fix. The same cannot be said of the others as I got mixed scenarios. Some like Hive have this problem internally per my investigations. Other like sembast work perfectly.

What is happening in a nutshell is type erasure. Map<String, dynamic> becomes Map<dynamic, dynamic> for inner structures and that's the source of the problem.

I've tried for quite some time to avoid using msgpack everywhere but I will now move to that as it gives much more control and avoids backend problems.

As I have to change this is more backends I believe it will only be ready on the next weekend

tjarvstrand commented 1 year ago

This was quite a problem although for the backends supporting msgpack it was quite easy to fix. The same cannot be said of the others as I got mixed scenarios. Some like Hive have this problem internally per my investigations. Other like sembast work perfectly.

Yeah, that's the problem with supporting many targets, you end up with the least common denominator of all of them 🙂 I think that maybe the quickest and most honest and/or efficient solution might be to just have the fromEncodable-function take a dynamic-argument. That way it's at least a bit more obvious to users that they might not receive exactly what they put in.

Anyway, thanks for looking into it!

ivoleitao commented 1 year ago

Hi, it's finally fixed on the latest version. I've added unit tests covering this scenario to make sure it does not happen aggain.

Once again, thanks for signalling this. It was quite and important bug and it is now fixed.

tjarvstrand commented 1 year ago

Brilliant!

katjai1 commented 1 year ago

Hello. Probably this change causes issues with web version:

Unsupported operation: Uint64 accessor not supported by dart2js.

 #0   packages/stash/src/api/codec/bytes_writer.dart 145:5                          writeUint64
 #1   packages/stash/src/msgpack/writer.dart 142:7                                  [_writePositiveInt]
 #2   packages/stash/src/msgpack/writer.dart 149:7                                  writeInt
 #3   packages/stash/src/msgpack/writer.dart 306:7                                  [_writeObject]
 #4   packages/stash/src/msgpack/writer.dart 349:10                                 write
 #5   packages/stash/src/msgpack/writer.dart 341:13                                 [_writeEncodable]
 #6   packages/stash/src/msgpack/writer.dart 351:9                                  write
 #7   packages/stash/src/api/store.dart 144:11                                      encodeValue
 #8   packages/stash_hive/src/hive/hive_store.dart 362:16                           [_writeEntry]
 #9   packages/stash_hive/src/hive/hive_store.dart 225:27                           putEntry
 #10   packages/stash/src/api/cache/generic_cache.dart 184:36                        <fn>
ivoleitao commented 1 year ago

That's a bit strange to be honest, do you have values of this magnitude ? I only use writeUint64 on messagepack if the value is greater than 0xFFFFFFFF which is quite large.

Also I was aware of this limitation on Web but not expecting this to be needed due to the size of the values.

Also to confirm:

I separate the unit tests in here explicitly with @TestOn('!js') because of this

ivoleitao commented 1 year ago

Hum I think they use a dirty trick for this, in a nutshell they use double, not the most optimal thing but I can do something similar on web. I can try once you get back to me on the previous questions

https://github.com/hivedb/hive/blob/e87134f80ff4ddf61391a266139940a3c4a4bde9/hive/lib/src/binary/binary_writer_impl.dart#L107

katjai1 commented 1 year ago

When I added print() for value:

  Uint8List encodeValue(dynamic value) {
    final writer = codec.encoder();
    print(value);

with our override:

  @override
  String toString() {
    return 'CacheItem: ${value.toString()}';
  }

I get this:

CacheItem: {email: , id: 0, username: , loginType: regular, isTest: false}

So, nothing unusual, simple json.

ivoleitao commented 1 year ago

Humm, the only place where I write a Uint64 is in here: https://github.com/ivoleitao/stash/blob/a255ef241a2a8720abfac62decedb086d66a4be1/packages/stash/lib/src/msgpack/writer.dart#L128

  void _writePositiveInt(int n) {
    if (n <= 127) {
      writeUint8(n);
    } else if (n <= 0xFF) {
      writeUint8(types.uint8);
      writeUint8(n);
    } else if (n <= 0xFFFF) {
      writeUint8(types.uin16);
      writeUint16(n);
    } else if (n <= 0xFFFFFFFF) {
      writeUint8(types.uint32);
      writeUint32(n);
    } else {
      writeUint8(types.uint64);
      writeUint64(n);
    }
  }

I believe this means you are falling in the else so bigger than 0xFFFFFFFF

And if you ended up here your id is higher than 0 because it forked in: https://github.com/ivoleitao/stash/blob/a255ef241a2a8720abfac62decedb086d66a4be1/packages/stash/lib/src/msgpack/writer.dart#L149 per stacktrace

Are you sure the field id is not bigger than 0xFFFFFFFF in some scenario (overflow for example) ?

ivoleitao commented 1 year ago

Just to make sure this is working, I've tried on web with:

class Task {
  final int id;
  final String title;
  final bool completed;

  Task({required this.id, required this.title, this.completed = false});

  /// Creates a [Task] from json map
  factory Task.fromJson(Map<String, dynamic> json) => Task(
      id: json['id'] as int,
      title: json['title'] as String,
      completed: json['completed'] as bool);

  /// Creates a json map from a [Task]
  Map<String, dynamic> toJson() =>
      <String, dynamic>{'id': id, 'title': title, 'completed': completed};

  @override
  String toString() {
    return 'Task $id, "$title" is ${completed ? "completed" : "not completed"}';
  }
}

void main() async {
  test('bug', () async {
    // Creates a store
    final store = await newHiveDefaultCacheStore();

    // Creates a cache with a capacity of 10 from the previously created store
    final cache = await store.cache<Task>(
        name: 'cache',
        fromEncodable: (json) => Task.fromJson(json),
        maxEntries: 10,
        eventListenerMode: EventListenerMode.synchronous)
      ..on<CacheEntryCreatedEvent<Task>>().listen(
          (event) => print('Key "${event.entry.key}" added to the cache'));

    // Adds a task with key 'task1' to the cache
    await cache.put('task1',
        Task(id: 0, title: 'Run cache store example', completed: true));

    // Retrieves the value from the cache
    print(await cache.get('task1'));
  });
}

And I had no errors with id 0, 1 or -1

katjai1 commented 1 year ago

I've found the source of an issue, we used DateTime.now().millisecondsSinceEpoch for cache eviction policy and this was the part of a hidden package structure. Thanks for explanation!

ivoleitao commented 1 year ago

Ok I will still look at this as I think I can improve this corner case for web.

Closing this issue