sanskrit-coders / nakshatra-app

GNU General Public License v3.0
0 stars 0 forks source link

Memory efficient dictionary indexing #8

Open vvasuki opened 6 years ago

vvasuki commented 6 years ago

Copying from https://github.com/sanskrit-coders/nakshatra-app/issues/4 -

"oh! loading the word list into the RAM is a bad idea. And there is no need for us to "reinvent" the concept of a memory-efficient fast database. You can just use whatever Android offers - https://developer.android.com/training/data-storage/room/ .."

This is something we must solve before publishing ..

damooo commented 6 years ago

Annaa, Room seems just a wrapper around SQLite, not an indexing mechanism. If we have to save all of the idx files in sqlite databases their size will be too much. so we can first index them with some algorithm, and then we can then persist them in whatever way.

vvasuki commented 6 years ago

मित्र, a couple of points:

damooo commented 6 years ago

Yes anna, it is not only possible, but very easy with sqlite indexing. initially i created app as an embedded tamil dictionary app, which use sqlite database as main idx, instead of .idx file. So that there is no need to index again. But then changed it to support generic stardict format with .idx files.

apps like GoldenDict stores just indices of .idx files, instead of entire idx files in some proprietary format. so that size in persistent storage will be acceptable. We may can create some index for .idx files, and then access them with random access instead of loading them into RAM. may be it will take some time to get knowledge about some indexing mechanisms, but may be useful. otherwise we can then load entire .idx files into sqlite database.

vvasuki commented 6 years ago

otherwise we can then load entire .idx files into sqlite database.

This is an excellent option, and just what I suggested. Otherwise, while looking up words or doing auto-fill will involve opening and checking 100 idx files one after the other.. If at some future time you find something superior and more maintainable - we can alway switch.