copying attribute tables very inefficient if tables are in same database

landam commented 5 years ago

Reported by mlennert on 11 Feb 2015 18:18 UTC The function copy_table in lib/db/dbmi_client/copy_tab.c seems very inefficient to me when both the old and the new table are in the same database and it is a database with real SQL functionalities. IIUC, the function actually selects all rows from the input table into a cursor and the inserts them, row by row, into the output table. With a large table this takes a long time.

Using a simple "CREATE TABLE new_table AS SELECT [*, ListOfColumns] FROM old_table WHERE [Conditions]" is much faster !

So, it would be a great addition if for the relevant drivers, such as SQLite and PostgreSQL, could use this approach.

P.S. There is no LibDB component in the tracker. Don't know if there should be...

GRASS GIS version and provenance

svn-trunk

Migrated-From: https://trac.osgeo.org/grass/ticket/2587

landam commented 5 years ago

Comment by mlennert on 24 Aug 2015 12:24 UTC To reinforce this:

I have a vector map with 881,120 points. I have to loop over this map, extracting points according to one attribute. When I use v.extract to extract the points and to create the table it takes quite a long time, most of it in "writing attributes":

time v.extract produnits_light where="cd_nacebel_2010='27120' and etab_emploi_onss_2010>0" out=prod_temp --o

real    0m20.183s
user    0m12.872s
sys 0m17.516s

i.e. 20s, although this only extracts 93 points !

Using a script that extracts the points without attribute table and then creates the table using different tools, it is much quicker. Here's the script (t.sh):

v.extract -t produnits_light where="cd_nacebel_2010='27120' and etab_emploi_onss_2010>0" out=prod_temp --o
v.db.select produnits_light where="cd_nacebel_2010='27120' and etab_emploi_onss_2010>0" sep=, file=test.csv --o
v.info -c produnits_light --quiet | cut -d'|' -f1 |  sed 's/TEXT/STRING/' | sed 's/DOUBLE PRECISION/REAL/' | sed 's/\(.\)\(.*\)/"\1\L\2"/g' | paste -s -d ',' > test.csvt
db.in.ogr input=test.csv out=prod_temp_table --o
v.db.connect -o prod_temp table=prod_temp_table key=cat_

time sh t.sh

real    0m1.241s
user    0m1.000s
sys 0m0.168s

The result is identical except that cat is renamed cat_ by db.in.ogr...

landam commented 5 years ago

Comment by neteler on 5 May 2016 14:08 UTC Milestone renamed

landam commented 5 years ago

Comment by neteler on 28 Dec 2016 15:04 UTC Ticket retargeted after milestone closed

landam commented 5 years ago

Modified by @landam on 5 May 2017 20:40 UTC

landam commented 5 years ago

Comment by @landam on 1 Sep 2017 20:28 UTC All enhancement tickets should be assigned to 7.4 milestone.

landam commented 5 years ago

Comment by neteler on 26 Jan 2018 11:40 UTC Ticket retargeted after milestone closed

landam commented 5 years ago

Modified by neteler on 12 Jun 2018 20:48 UTC

landam commented 5 years ago

Comment by @landam on 25 Sep 2018 16:52 UTC All enhancement tickets should be assigned to 7.6 milestone.

landam commented 5 years ago

Comment by @landam on 25 Jan 2019 21:07 UTC Ticket retargeted after milestone closed

landam / grass-gis-git-migration-test

copying attribute tables very inefficient if tables are in same database #154

GRASS GIS version and provenance