Await asynchronous calls in benchmark

AKushWarrior commented 4 years ago

https://github.com/jonataslaw/get_storage/blob/master/storage_benchmark/lib/runners/get_storage.dart#L55

Your writes/deletes are not synchronous. You have to await them, or your benchmark lacks validity. Dart's stopwatch class, right now, is simply measuring the queuing time for a Future + read/write/delete time from an in-memory hashmap.

(Alternatively, you could make writes/deletes synchronous by using File.writeAsStringSync(), but this isn't recommended.)

On another note, I find it vaguely concerning that you awaited every other library's operations (i.e. measured them properly), but exempted your own from proper measurement.

jonataslaw commented 4 years ago

https://github.com/jonataslaw/get_storage/blob/master/storage_benchmark/lib/runners/get_storage.dart#L55

Your writes/deletes are not synchronous. You have to await them, or your benchmark lacks validity. Dart's stopwatch class, right now, is simply measuring the queuing time for a Future + read/write/delete time from an in-memory hashmap.

(Alternatively, you could make writes/deletes synchronous by using File.writeAsStringSync(), but this isn't recommended.)

On another note, I find it vaguely concerning that you awaited every other library's operations (i.e. measured them properly), but exempted your own from proper measurement.

Hi, thanks for that. I just cloned the Hive benchmark, and added GetStorage, but I think I will remove these benchmarks because maybe that will give me a lot of unnecessary work. I understand your point of view on reading and writing, and looking at the data integrity angle, you are right (really, I agree with you), but the purpose of this library is not to store hundreds of data (even because it uses localstorage for web instead of indexedDB), it's just storing key / value persistently on all platforms, since sharedPreferences are not for that. I think I made it clear in the readme that this is not a database and should not be used as one, and I could take thousands of other approaches, like using apend, using the new File package to make a copy of the memory storage drive on the disc, in short, there would be hundreds of paths to track, but the purpose of this package is not and will never be used as a database. It's just having a package that adds less than 10kb to the Flutter web, and that stores data and keeps the synchronous reading in the IO. It is made to store cache of simple http, strings, ints, maps, and list requests, and that's it. If you already have path_provider in another package, it adds less than 60kb in the final Flutter application, and it was written and less than 200 lines of code, if I need to create a database in the future, I will create another package, the purpose of which is not that. Well, thanks for the advice, if you have ideas on how to improve the package I would love to hear, as long as it does not imply a change in scope. I already used your steel_crypt package, and I was going to implement a simple encryption on that lib with it, but I don't intend anything more than that.

davidmartos96 commented 4 years ago

@jonataslaw I think people's concerns with the benchmarks is that you are comparing database solutions with GetStorage. If you say it is not meant as a database, why a benchmark with those packages in the first place? I think it would be very helpful to include in the README the use cases you try to solve with the package, like http caching as you mention. The limitations of the package would be another thing that could be worth pointing out, like one cannot be certain that the data will be persisted if the app is closed. I think all these suggestions could help newcomers and would avoid confusion in the community.

AKushWarrior commented 4 years ago

I fully agree with @davidmartos96. If I had known that the package was a persisted LRU cache or something of that sort, I wouldn't have been as concerned about this package. As it is, I think "key-value store" is a bad name for what this is. That term implies a Redis-like structure, when this is more like a simple cache backed by some limited persistence.

The benchmark can stay btw; just make sure you use await in the appropriate places. If you want I can PR with my recommendations. However, consider that comparing with real databases is probably giving the wrong impression.

On a personal note, @jonataslaw you have some clout in the community because of Get. You should probably be careful about your wording in READMEs and documentation, because it can easily give newcomers the wrong idea.

jonataslaw commented 4 years ago

I fully agree with @davidmartos96. If I had known that the package was a persisted LRU cache or something of that sort, I wouldn't have been as concerned about this package. As it is, I think "key-value store" is a bad name for what this is. That term implies a Redis-like structure, when this is more like a simple cache backed by some limited persistence.

The benchmark can stay btw; just make sure you use await in the appropriate places. If you want I can PR with my recommendations. However, consider that comparing with real databases is probably giving the wrong impression.

On a personal note, @jonataslaw you have some clout in the community because of Get. You should probably be careful about your wording in READMEs and documentation, because it can easily give newcomers the wrong idea.

Well, "key/value" storage in theory means "Map storage '. I think the description is technically right at that point (correct me if I'm wrong). I didn't just compare it with databases, there's the sharedPreferences which is very slow, and doesn’t support most platforms, doesn’t support Maps and Lists, and it’s there SP has exactly the same scope as this package, without the benefit of caching, and many people try to compare it with noSQL databases and SQL, which doesn't make much sense to me, since I made it clear in the readme that if the intention is to use a database, there are much better solutions. Persistence is not "limited", in tests it has 100% coverage and you can see that it will always be recorded. Is it more prone to errors? IS. Is there any certainty that the data was actually recorded on the disc? Just using await, which is not recommended for this library, since it is a memory storage and it does not make sense to search the data on the disk every time. It is a different approach, unorthodox, and even risky if you intend to record millions of data under a risk of power failure, but that is not the scope of the project.

jonataslaw commented 4 years ago

I updated the Readme to make it clearer what the package does, when to use it, and when not to use it.

AKushWarrior commented 4 years ago

That's a lot better. Good job.

davidmartos96 commented 4 years ago

@jonataslaw

I updated the Readme to make it clearer what the package does, when to use it, and when not to use it.

Great!

My concern with the benchmarks is the comparison with Hive and Sqlite. You could leave the comparison with shared preferences as it matches more the use case of the package. On top of that, the write and delete benchmarks for Sqlite (sqflite and moor) do not really correspond with a real app use case. Usually, when performing multiple inserts or deletes at once (50, 1000, you name it), they are run within a transaction which improves the performance quite a lot.

AKushWarrior commented 4 years ago

@davidmartos96 @jonataslaw I think the most prudent thing is to just get rid of Hive and SQLite in the benchmarks. Then, await your write/delete operations, and the comparisons should be fair.

davidmartos96 commented 4 years ago

@jonataslaw I filed #3 as an attempt to solve this issue. In any case, even with the changes in the PR, I agree with @AKushWarrior in that Hive and SQLite benchmarks could be removed.

AKushWarrior commented 4 years ago

The benchmark still feels strange. How is it possible that moor (a wrapper of sqflite) is faster than sqflite?

davidmartos96 commented 4 years ago

@AKushWarrior Sqflite is run via platform channels while moor(_ffi) uses FFI which runs on the main isolate. Platform channels can give some overhead when running in a loop, which is the case for write operations in a database.

AKushWarrior commented 4 years ago

Ah, makes sense. It's really a shame that FFI is so complicated to use for now; it's a substantial time boost over platform channels.

AKushWarrior commented 4 years ago

Also, @jonataslaw this benchmark has the right idea. Maybe you only flush the data when the user tells you to, instead of after EVERY manipulation.

@davidmartos96 I think the issue of benchmark unfairness still remains. If you don't use the full write method every call, then data isn't really persisted. For every other solution, killing it halfway would leave half of the data stored and intact. For get-storage, all that data just vanishes.

AKushWarrior commented 4 years ago

Okay, that previous comment was wrong; using a transaction is comparable to this method.

Remove Hive. It's fundamentally different in methodology. Everything else can stay.

jonataslaw commented 4 years ago

I appreciate the comments, this was very useful for me, the next update will have an automatic backup that will be recovered if at any point the file is truncated. Even with Hive and transactions, there is no security today if an app is closed during a transaction. Flutter does not call dispose when the application is closed. I'm also working on an update on the platform's channels in the onDestroy and applicationWillTerminate method (for the desktop there will be no way to do this, and for the web there is no need, because even after the browser is closed, operations continue). Thanks for the PR @davidmartos96 and thanks for the suggestions @AKushWarrior

I am closing this, because I believe it is already resolved.

AKushWarrior commented 4 years ago

Thanks for taking the time to listen @jonataslaw . Hope that the discussion here was constructive for you.

jonataslaw / get_storage

Await asynchronous calls in benchmark #1