simolus3 / drift

Drift is an easy to use, reactive, typesafe persistence library for Dart & Flutter.
https://drift.simonbinder.eu/
MIT License
2.65k stars 370 forks source link

Run type converters inside drift isolate #1872

Closed clragon closed 2 years ago

clragon commented 2 years ago

Is your feature request related to a problem? Please describe. Our app is making alot of database calls and because quite a bit of the data is stored as json, alot of json decoding is happening. So many json decode calls, in fact, that it can happen to impact the main thread, which the UI runs in.

We are trying to be conservative with how many database calls we make, but even then at certain points, the amount of json decoding happening at once can make our UI quite a bit less responsive.

Describe the solution you'd like I would be interested to know whether it would be possible to have the json decoding run inside the drift isolate.

We have checked that the json decoding is in fact happening in the main thread. I know that it is a common practice to move json decoding into isolates when doing, for example, http calls, when the amount of json is siginifcant. There is also that, copying objects accross isolates is not without cost, but I am unsure what the ratios are between copying the raw data vs the decoded objects.

I have taken a look at the drift docs but not seen a feature that would allow us to offload the type conversion into the isolate. We could probably have our own isolate and send the data there to be decoded, but that would involve alot of boilerplate and we would have to abandon the type converters and we would also have another extra 2 performance hits copying to the decoding isolate and back.

simolus3 commented 2 years ago

So many json decode calls, in fact, that it can happen to impact the main thread, which the UI runs in.

Did you run a profiler on your app to see if that's really a bottleneck? I'm not saying it can't happen, but it's surprising to hear about that amount of json being parsed. Unless you have few rows with huge json columns, perhaps there's also a problem with loading too many rows at once if they're not shown directly?

At the moment, drift's remote / isolate feature only affects the rather low-level database layer, mapping rows happens on the "client" isolate like you said. With the recent engine / isolate group feature in Dart, I think passing decoded Dart objects between isolates is reasonably fast, so there can be a performance increase here. But it would require a bunch of additional generated code so that the background isolate knows how to decode the values. I can take a look at supporting that in the generator, but then I'd love to see some benchmarks backing the necessity of this offloading.

clragon commented 2 years ago

Sure, I can explain the scenario a bit more in detail.

In our app, we do on-device neural network training and we require alot of data. Our columns have data used for querying and then a json encoded item from the backend. When training the network, we make very broad queries.

We have taken a look and it seems to be between (0.0002ms - 0.0329ms) per decode, depending on the size of the json.

Here's a screenshot of the profiler when we are booting the app with a rich database of around 46'000 entries: image

As seen in that screenshot, the UI is stalled for around 3 seconds. The majority of it is json decoding calls. Accessing all rows in the database isnt always necessary, but quite a few times, so this UI lag can happen more than once.

The things we query for are stored in the columns and not the json for the most part, so these json decode calls come only from data we actually want to pull from the database.

kuhnroyal commented 2 years ago

If you have this amount of json data, maybe a Relational database is not the place to store it :)

I have used https://pub.dev/packages/isolate_json before, but not in a type converter.

clragon commented 2 years ago

maybe a Relational database is not the place to store it :)

I am not sure what you mean by this. We have specifically switched to a real database so that we can query our huge amount of data. We previously stored all the json in a key value storage, like hive, and then done all the filtering in dart.

We could store the json outside of the database and then run the json decoding in an isolate, but it would come at the cost of having an extra storage that needs to be "queried" and read and then copied to the special json decoding isolate and back. I feel like storing the json outside of the database away from its attributes would just introduce alot more problems than it solves.

What would your suggestion be for storing those json items?

kuhnroyal commented 2 years ago

It seems you have thought about it, just wanted to encourage that if you had not.

It totally depends on your use-case. Do you actually need all the JSON from all the 46k lines? If not, a separate storage or even just a separate table could avoid a lot of the decoding. You can also try custom queries and result classes which don't contain the JSON field(s).

But if you actually need all the JSON content at startup, then it needs to be decoded, no matter the storage.

I don't think this decoding belongs in the Drift isolate, because it will just block other database operations there. Creating an isolate with isolate_json, keeping it alive and using it in the type converts is probably the way to go. I use this approach for large JSON request bodies.

Just don't create a new isolate every time, that will make it even slower.

clragon commented 2 years ago

We definitely do need all the json, because our other column data is mostly there to query the json data. separating the json into its own storage wouldnt provide any benefits.

keeping it alive and using it in the type converts is probably the way to go.

We have considered that, however, type converter callbacks seem to be synchronous, so we cannot insert a call to an isolate there.

if they were allowed to be asynchronous, this would be a viable solution, even though we would still lose performance copying the data around, which would be one of the advantages of having the decoding run inside the isolate.

I wouldnt mind having it block the queries, thats still by far better than it blocking the entire UI thread.

kuhnroyal commented 2 years ago

type converter callbacks seem to be synchronous

Oh yea, forgot about that one, stumbled upon this a while back as well. Then I only see a workaround in using String fields in the database entity and decoding/mapping it afterwards in an isolate.

clragon commented 2 years ago

That is also the conclusion we have come to, but it would be quite impractical and involve quite a bit of boilerplate. Because now we have to disassemble the objects we have decoded from the database with type converters, decode the json there, make a new object and then assemble it again.

It would be very kind if it could be considered to add either json decoding inside the drift isolate, or having asynchronous type converters (not sure if thats feasible).

knaeckeKami commented 2 years ago

I also ran into this issue.

To be honest, it was more of an optimisation issue issue in the application code - a whole table with potentially thousands of entries was queried at once without paging and displayed in a listview.

But sqlite is fast enough so that this is not an issue - the query took just a few ms - but the entries also had a json column and the decoding of the json caused it to skip frames.

As a workaround, I did not decode the json with a type converter but just queried it as string and decoded the json lazily when actually accessing it - with this workaround the skipped frames did not occur again.

Mike278 commented 2 years ago

Sqlite's json module is what I typically reach for when a situation calls for storing json blobs. Delegating the json decoding to sqlite means it'll happen in the database's isolate and won't block the ui. It's probably faster too - that link gives numbers around 1GB/s.

You can also take advantage of sqlite's generated columns to make querying json more readable and performant - something like add column some_value text as (json_extract(the_json_blob, '$.some_value'));. You can trade off space or time by making the column VIRTUAL or STORED, and even add an index on some_value to make specific use-cases faster.

we definitely do need all the json

Personally this is the spot I'd considering thinking a bit more about. What are the individual pieces of data displayed in the ui, and could those be implemented as (probably multiple) sql queries?

clragon commented 2 years ago

Hi @Mike278, unfortunately we still do need to entire data, we cant shave of any part of it. There are no individual pieces of data being displayed in the UI, the data is loaded to be fed to the neural network and it needs the whole json we store.

simolus3 commented 2 years ago

I don't think drift should offload type converter work onto isolates by default. Sending objects through isolates doesn't really work for stateful objects, and we can't guarantee user-written type converters or custom data classes being stateless. It just seems like a thing that users should opt-in into on a per-case basis.

So I think I'll close this in favor of #1895.