mpicbg-scicomp / tissue_miner

A toolkit to quantify cell dynamics in living tissues
BSD 3-Clause "New" or "Revised" License
11 stars 8 forks source link

Querying dbonds for the first time is very slow #5

Open holgerbrandl opened 9 years ago

holgerbrandl commented 9 years ago

It's likely to be a caching issue, since it is slow just once after a system reboot:

movieDir="/Users/brandl/Desktop/WT_25deg_111102_ForTissueMiner"
db_name=basename(movieDir)

Sys.setenv(TM_HOME="/Volumes/projects/project-raphael/scripts/tissue_miner")
scriptsDir=Sys.getenv("TM_HOME")

#' profile tm query

source(file.path(scriptsDir, "commons/TMCommons.R"))

movieDb <- openMovieDb(movieDir)

system.time(
    dbond <- dbGetQuery(movieDb, "select frame, cell_id, dbond_id, conj_dbond_id, bond_id, vertex_id from directed_bonds")
)

To resolve the problem 1) figure out where this cache is actually located and how it could be configured 2) check if the first query is slow per db or per system 3) check if we could substitute the first query with a faster one (like with using LIMIT 10;) for boostrapping the cache 4) try to setup dockerized test to allow for reproducing the problem without a reboot 5) check if native sqlite performance (without R in-between) is better or suffers from the same problem

Additional references: SELECT query first run is VERY slow http://sqlite.1065341.n5.nabble.com/SELECT-query-first-run-is-VERY-slow-td33100.html

A Deep Dive Into Unexpectedly Slow SQLite Queries http://blog.isaachodes.io/p/deep_dive_into_slow_sqlite3_queries/

SQLite performance for 10 Million Records http://sqlite.1065341.n5.nabble.com/SQLite-performance-for-10-Million-Records-td11019.html

etournay commented 9 years ago

Slow is relative to the computer performance. It takes less than a minute to get the first query on a good workstation (ex: DELL T7500 or good MacPro). But it takes more than 1hour on my 8-year-old laptop...