mgolokhov / dodroid

May the knowledge be with you!
MIT License
20 stars 12 forks source link

Data management in the app. #37

Open shiraeeshi opened 8 years ago

shiraeeshi commented 8 years ago

Let's discuss global data in android app, pros and cons, research how others deal with it.

shiraeeshi commented 8 years ago

I think that this is bad because the programmer will not know what exactly portion of global data each class uses and mutates and when. If something goes wrong with global data, it will be a pain to search the culprit (because every class can change global data). It will be a lot easier to maintain the app if each activity runs in its own "sandbox": you invoke it with some arguments and it gives you the result and you are sure that nothing bad happens in the middle of the process. Also think about this scenario: user enters some settings, clicks, goes to another settings activity, changes something, then goes to third activity and then realizes that he wants to cancel everything he just changed. I think the main caveat of this approach is a lot of copying, you need to create a copy of an array if you don't want to let another activity change its contents.

mgolokhov commented 8 years ago

In general you are right. We should always ask yourself if data really need to be global. Is DB a global thing? We get chunks of the file a the memory. We can query info from different activities and as a result can get different copies which need to be synchronized and merged. With global interface we get references to the same data (not a copy). Also we have different types of data - questions are read only.

shiraeeshi commented 8 years ago

I agree with this: "We can safely make read-only immutable data global". And if data is not read-only and immutable we can pass it as intent extras, but if it's too big or not parcelable, then we need some kind of what I call "inter-activity data holder". But again, we have to pass copies, not original objects.

mgolokhov commented 8 years ago

To have a real copy you need to implement deep copy or just query DB again for a new collection. No need for "inter-activity data holder" (maybe I didn't get it).

With copies how do you suggest to sync data?

Also we should consider how an activity handle info. For now we have linear style: topics activity or questions activity modify statistics.

shiraeeshi commented 8 years ago

What if that data is partially mutated itself or not present in a DB? I think about general principle, may be it's not about this particular case with dodroid app.

Data get synced (if it has to be) when activity returns. Think about activities as functions: each activity has signature: parameters and result. But I think not every activity has to be implemented this way, if imagine activities tree, I think this rule will work: more far the activity from the main branch, more function-like it will become. It's like a people organization: people organize as hierarchy, and responsibility decreases if go from main branch to leafs.

mgolokhov commented 8 years ago

What if that data is partially mutated itself or not present in a DB?

This is a problem with copies, you suggest to wait activity returns. How you will handle configuration changes?

shiraeeshi commented 8 years ago

When configuration changes, activity will act as usual: it will save the data it works with in "onPause" or wherever, and then it will get it back in "onCreate".

mgolokhov commented 8 years ago

So you have to pack and unpack all copies?

shiraeeshi commented 8 years ago

You're right, it can become too expensive. The solution is to save current data in a special fragment. (http://developer.android.com/reference/android/app/Fragment.html#setRetainInstance(boolean)). Adam Porter mentioned about this method in this lection (starting from 10:43): https://class.coursera.org/androidpart1-011/lecture/29.

mgolokhov commented 8 years ago

bad happens in the middle of the process

Can you give more examples?

scenario: user enters some settings

Yes, this is the case with rollback. SharedPreferences with SharedPreferences.Editor and commit() will help us.

mgolokhov commented 8 years ago

Let's define read-only and writable data. Implement global style for former and copy on write for latter.

mgolokhov commented 8 years ago

I'm studying SQLite and something is nagging me.

  1. Big table vs multiple small tables. SQLite is not a classic DB, so I'm not sure that normalization gives a real speed boost.
  2. Internally it always uses ROWID, with compound primary key we should do index.
shiraeeshi commented 8 years ago

I'm not sure that normalization gives a real speed boost.

I can't imagine a many-to-many relation without three tables being involved. How to do it and what for?

mgolokhov commented 8 years ago

It depends on what types of queries we do. In the current implementation:

  1. Give me all questions ids.
  2. Give me all questions with tag "some_tag". As the same question can have many tags it's a good idea to have table for tags (to avoid query with LIKE).
mgolokhov commented 8 years ago

And more things for discussion. During DB initialization from json file we have all questions in a memory. So we have two strategies a current one - fast batch mode, or insert questions one by one and it will be slower. Any way, Google recommendations are: work with DB asynchronously (in another thread) or/and use Content Provider.

mgolokhov commented 8 years ago

Read only data: Questions: id, tags, text, right and wrong answers, documentation link. Copy on write data: Statistics: different types of counters.

mgolokhov commented 8 years ago

There is no need for the answer table. Let's do some calculations. With right and wrong columns in the big question table for true of false style we have: TEXT type with encoding utf-8 and 1 byte for ascii character true - 4 bytes, false - 5 bytes = 9 bytes With optimized table is_binary - 1 byte, answer_is_true - 1 byte = 2 bytes Result: 7 bytes lighter for every question.

For a general list of answers we get ~16bytes overhead per answer. With the answer table: (id, answer text, is_right integer, question_id) ROWID - 8 bytes, same as in the big table - separator (1byte), 1 byte, 8 bytes One downside with the big table - we have to split answers by "\n".

Maybe I missed something.

shiraeeshi commented 8 years ago

we have to split answers by "\n"

The program will spend some time parsing answers. How to prove that "querying optimized table + parsing answers" is faster than "querying unoptimized tables"?

mgolokhov commented 8 years ago

As for speed I can't tell you without profiling. Also we can shuffle and limit data in a query, so for 10 questions it will be negligible value.

mgolokhov commented 8 years ago

https://github.com/mgolokhov/dodroid/wiki/Saving%20Data%20in%20Databases