Merged towers db (lacells.db) is larger than merge.db

aelmahmoudy commented 9 years ago

I ran gen_lacells it produced merge.db with a size of 550 MB (I didn't filter out any MCC/MNC) Then I ran the commented sqlite code in gen_lacells: .headers on .mode csv .output merged.csv SELECT * FROM cells ORDER BY mcc, mnc, lac, cid; .quit

then ran: ./merge_duplicate_towers > towers.csv

the resulting towers.csv (415 MB) was smaller than merge.csv (484 MB)

then I ran the rest of commented sqlite commands: sqlite3 lacells.db <<! CREATE TABLE cells(mcc INTEGER, mnc INTEGER, lac INTEGER, cid INTEGER, longitude REAL, latitude REAL, altitude REAL, accuracy REAL); .mode csv .import towers.csv cells CREATE INDEX _idx1 ON cells (mcc, mnc, lac, cid); CREATE INDEX _idx2 ON cells (lac, cid); VACUUM; .quit !

Yet the resulting lacells.db was 691 MB ! Although I notice that this database doesn't have the 'samples' column !

aelmahmoudy commented 9 years ago

I was wrong, for some reason _idx1 was not created for merge.db. After creating it (and running VACUUM), its size increased to 732 MB. It seems merge.db isn't created properly by gen_lacells script, as I get those errors:

Error: near line 12: unrecognized token: "#" Error: near line 16: unrecognized token: "#" Error: near line 36: unrecognized token: "#"

n76 commented 9 years ago

Lines 111 through 136 of gen_lacells should be deleted. Or the first "#" on each of those lines should be removed.

With those lines removed then there will be duplicate records where OpenCellID and Mozilla Location Services both have data on the same cell tower. WIth the "#" removed then the duplicate records will be removed.

For my gsm-backend, deleting those lines should result in the same on-phone database a the backend will create for itself using the tools in the settings menu. If you are using this database with the microG project's original backend you should remove the duplicates.

I only put a "#" in front of each line to keep the lines around to make it easier for me to cut and paste things when I was testing. Unfortunately you can't just re-run the script in whole if there is a bug as the OpenCellID project limits downloads to one per day. :(

You are correct, the samples field is missing if you do the merge.

I will do two things:

Fix merge_duplicate_towers to keep the samples field.
Split gen_lacells into two versions, one that does the merge towers and the other that does not.

aelmahmoudy commented 9 years ago

On the mention of uG's original backend. cells.db that was used by the old backend (ie. before UnifiedNlp) was about 65 MB although it was for the whole world, why is your db so large even after removing duplicates ?

n76 commented 9 years ago

I have added two fields: One for estimated coverage range of each tower and another for number of samples.
Their old database does not have the elevation field but it is in mine because I was informed by their project lead that it should be there.
Their database is old and that the source data has grown.
If you are using data from both OpenCellID and Mozilla Location Services (MLS) but don't merge duplicate records you will be using more space.
There are towers in MLS that are not in OpenCellID.
They don't have an index in their database which substantially increases the size of the database but very definitely from my testing increases the lookup speeds and decreases the runtime load on the phone.

You can use sqlite3 on your laptop/desktop computer to examine the database files to see their structure and how many records are in each file. For example, the lacells.db that is on my phone at present:

$ adb pull /sdcard/.nogapps/lacells.db 3912 KB/s (101584896 bytes in 25.357s) $ sqlite3 lacells.db SQLite version 3.7.7 2011-06-25 16:35:41 Enter ".help" for instructions Enter SQL statements terminated with a ";" sqlite> .tables cells sqlite> .schema cells CREATE TABLE cells(mcc INTEGER, mnc INTEGER, lac INTEGER, cid INTEGER, longitude REAL, latitude REAL, accuracy REAL, samples INTEGER, altitude REAL); CREATE INDEX _idx1 ON cells (mcc, mnc, lac, cid); CREATE INDEX _idx2 ON cells (lac, cid); sqlite> select count(*) from cells; 1166872 sqlite> .quit $

So there are 1.16 million records in my current database just for the United States. Downloading the North America cell database from the announcement on XDA developer's forum at http://forum.xda-developers.com/showthread.php?t=1715375 gives me:

$ sqlite3 cells-northa.db SQLite version 3.7.7 2011-06-25 16:35:41 Enter ".help" for instructions Enter SQL statements terminated with a ";" sqlite> .tables cells sqlite> .schema cells CREATE TABLE cells(lat REAL, lon REAL, mcc INTEGER, mnc INTEGER, lac INTEGER, cellid INTEGER); sqlite> select count(*) from cells; 736083 sqlite> .quit $

So all of North America including Canada and maybe Mexico has less than 3/4 million towers in their old database with fewer fields and no indices.

n76 commented 9 years ago

Pushed new scripts with fixes for issues (other than raw size of database) identified in this thread.

n76 / lacells-creator

Merged towers db (lacells.db) is larger than merge.db #2