Closed csaf4370 closed 10 years ago
Yeah that sounds terrible. Could you post the code for your example?
If you have used the viewcontainer class it should make bulk inserts and the template thing is a good idea that I'll try to implement sooner than later.
I had a further look in the unit test results that give some indication about the performance. Could you try to run this tests as well on your system (WITH_UNIT_TESTS flag in cmake)
Results are for 100.000 elements (read, write and modify)
Test | SSD (mac) | Build Server IUT (I assume HDD) |
---|---|---|
TestGDALPython.PythonConnectionTest | 9001 ms | 20015 ms |
TestGDALPython.PythonReadTest | 5985 ms | 11543 ms |
TestGDALPython.PythonModifyTest | 17113 ms | 45338 ms |
Ok i did a good amount of testing and these are the results for my machine (Ubuntu 14.04 64 bit with ext4 as filesystem) and the web01 server of the IUT:
Test | HDD (Ubuntu14.04- ext4) | web01 HDD-ext4 |
---|---|---|
TestGDALPython.PythonConnectionTest | 769049 ms | 839659 ms |
TestGDALPython.PythonReadTest | 702080 ms | 831317 ms |
TestGDALPython.PythonModifyTest | 698215 ms | 933812 ms |
As we can see in the table above the execution times are realy slow. I had a deeper look in this topic and found that sqlite is in combination with ext4 realy slow as ext4 has a barrier to ensure data-integrity and this slows down the process imense. With ext3 this problem doesn't occur, but because ext4 is the fs used the most, we have to find a solution.
I just tested to change the database from sqlite to postgre (on my local machine) and had the following runtimes:
Test | HDD (Ubuntu14.04- ext4) - Postgre as DB |
---|---|
TestGDALModules.GDAL_ADVANCE_API_TEST | 32321 ms |
TestGDALModules.TestInsertSpeed_DM | 15891 ms |
TestGDALModules.TestInsertSpeed | 16461 ms |
The test performed are not using the same unit tests, as i got a core dump before reaching those. (This was just a quick test to see the behaviour). As we can see the runtimes are here relative similar too the results you are getting. So one possibility would be to install postgre in the node-dockers and use the these as local-databases.
As stated above this was just a quick test, not fully working and I can not really tell if the data is sound, but as a disussion base it should work.
Looking forward to the opinions.
Regards, Martin
Hi,
if possible I would like to continue to work with the sqilte implementation. Some of the models use direct sql statements to perform some tasks more efficiently. However, if we can't get it running we can still move to PostGIS.
I tried to run another test under linux that runs in the Monash cloud and a local virtual box, and the results seem to be much better. I also tried to run the sql database in memory (see in_memory branch) but I couldn't find a difference in speed (probably means that the db was always in the file cache). Could you try the in memory DynaMind (core) branch and see if it makes any difference?
Test | SSD (mac) | Build Server IUT (I assume HDD) | Monash (Ubuntu 14.04 ext4 HDD?) | in memory | virtualbox (in memory and db) |
---|---|---|---|---|---|
TestGDALPython.PythonConnectionTest | 9001 ms | 20015 ms | 16373 ms | 16762 ms | 37055 ms |
TestGDALPython.PythonReadTest | 5985 ms | 11543 ms | 12673 ms | 12822 ms | 30406 ms |
TestGDALPython.PythonModifyTest | 17113 ms | 45338 ms | 22544 ms | 22534 ms | 43478 ms |
Apparently activating WAL should also be faster can be done with
options = CSLSetNameValue( options, "OGR_SQLITE_JOURNAL", "WAL" );
but I couldn't see difference in speed
thanks for your tests, unfortuenately the execution times with the in_memory branch didn't improve much:
Test | HDD (Ubuntu14.04- ext4) with in_memory branch |
---|---|
TestGDALPython.PythonConnectionTest | 682892 ms |
TestGDALPython.PythonReadTest | 652020 ms |
TestGDALPython.PythonModifyTest | 650486 ms |
will test the option above in the background and report back
Hi, lets go a step back and see if gdal with sqlite works. Sqlite should be able to do around 50.000 inserts per second but it looks like that the number is much lower on your system (~300 per sec) I get the feeling that something with the bulk inserts doesn't work.
I wrote a small benchmark script could you try and run it. It test 3 things. Firstly without bulk, secondly with and lastly in memory. If the bulk insert has similar speed could you please try to play with the number of transactions in one bulk. I use really high ones (100.000) that might be a problem?
On my system I get following results
10000 5.37733101845 sec 1859.65862352 inserts per sec 1000000 17.6309468746 sec 56718.4512047 inserts per sec 1000000 17.1116430759 sec 58439.741617 inserts per sec
from osgeo import ogr
import time
import uuid
def run_test(ds, e):
lyr = ds.CreateLayer( "point_out", None, ogr.wkbPoint )
field_defn = ogr.FieldDefn( "Name", ogr.OFTString )
field_defn.SetWidth( 32 )
lyr.CreateField ( field_defn )
start = time.time()
for i in range(e):
x = float(i)
y = float(i)
name = str(i)
feat = ogr.Feature( lyr.GetLayerDefn())
feat.SetField( "Name", name )
pt = ogr.Geometry(ogr.wkbPoint)
pt.SetPoint_2D(0, x, y)
feat.SetGeometry(pt)
lyr.CreateFeature(feat)
feat.Destroy()
end = time.time()
print str(e) + "\t" + str(end - start) + " sec" + "\t" + str(e/(end - start)) + " inserts per sec"
ds = None
def run_test_bulk(ds, e):
lyr = ds.CreateLayer( "point_out", None, ogr.wkbPoint )
field_defn = ogr.FieldDefn( "Name", ogr.OFTString )
field_defn.SetWidth( 32 )
lyr.CreateField ( field_defn )
lyr.StartTransaction()
start = time.time()
counter = 0
for i in range(e):
counter +=1
x = float(i)
y = float(i)
name = str(i)
feat = ogr.Feature( lyr.GetLayerDefn())
feat.SetField( "Name", name )
pt = ogr.Geometry(ogr.wkbPoint)
pt.SetPoint_2D(0, x, y)
feat.SetGeometry(pt)
lyr.CreateFeature(feat)
feat.Destroy()
if (counter == 100000):
lyr.CommitTransaction()
lyr.StartTransaction()
counter = 0
lyr.CommitTransaction()
end = time.time()
print str(e) + "\t" + str(end - start) + " sec" + "\t" + str(e/(end - start)) + " inserts per sec"
ds = None
elements = 10000
options = ('OGR_SQLITE_CACHE=1024', 'OVERWRITE=YES',)
drv = ogr.GetDriverByName( 'SQLite' )
db_name = str(uuid.uuid4())
db_pg = drv.CreateDataSource("/tmp/"+db_name+".sqlite", options)
run_test(db_pg, elements)
elements = 1000000
db_name = str(uuid.uuid4())
db_pg = drv.CreateDataSource("/tmp/"+db_name+".sqlite", options)
run_test_bulk(db_pg, elements)
elements = 1000000
db_name = str(uuid.uuid4())
db_pg = drv.CreateDataSource(":memory:", options)
run_test_bulk(db_pg, elements)
upstream bug in gdal <= 1.10.1 which is default for ubuntu 14.04! fixed in later versions see http://trac.osgeo.org/gdal/ticket/5270
If you run a gdal based system on regular HDD (not SSD) the startup is realy slow. The build up of the gdal-sqlite db (e.g. {aewe*fvgd}.db) which is only about 4mb big takes a lot of time:
Example: module ' CreateGDALComponentsMartin ' executed successfully (took 682032 ms) which is a realy small example module without another module connected.
One workaround is to set the db to :memory: which makes testing realy fast, but it will not solve the problem. Another possibility is to set sqlite pragma (e.g. PRAGMA synchronous=OFF ), which should increase the writing speed. In my tests the speedup is neglectable and its more dangerous to use it, as checks are omited. The best solution, would probably be to bulk the inserts, or use a template db (if possible), copy it and use it as a starting-point.