Open lgatto opened 3 years ago
Related issue: https://github.com/r-dbi/RSQLite/issues/344
It is stated explicitely because filterFile
(where these functions stem from) don't support this, i.e. if you call filterFile( ... , file = c(2, 1))
is the same than filterFile(..., file = c(1, 2))
. I wanted to have it implemented in that way, because I think it is important to allow the user to filter and re-order.
Hm, I find it strange that this causes performance issues with a SQL backend - in theory, if the SQL backend contains the primary keys in memory it should be just a subsetting and re-ordering of these primary keys and that shouldn't be too slow or memory demanding.
It is stated explicitely because
filterFile
(where these functions stem from) don't support this, i.e. if you callfilterFile( ... , file = c(2, 1))
is the same thanfilterFile(..., file = c(1, 2))
. I wanted to have it implemented in that way, because I think it is important to allow the user to filter and re-order.Hm, I find it strange that this causes performance issues with a SQL backend - in theory, if the SQL backend contains the primary keys in memory it should be just a subsetting and re-ordering of these primary keys and that shouldn't be too slow or memory demanding.
But sorting the _pkey, the ordering of file = c(2, 1)
is also required. Normally, we can use SQL ORDER BY
clause to sort the tables in SQLite; or fetch a data.frame
with _pkey
and The String Column
, then order this data frame in R.
But the prepared statement methods in either DBI
or RSQLite
don't support ORDER BY
clause (the related issue). That is why we have this problem now.
OK, maybe I need to have some more insights into your current implementation. Is the data stored in a single database or one database per file? And I guess the only information that is kept in the backend is the _pkey
and all data, also the dataOrigin
(I guess the dataStorage
is just the one database were you store the data).
The
filterDataOrigin
andfilterDataStorage
methods indicate that they@jorainer - what is the rationale for this, and why is this so important that it is stated explicitly? The reason I am asking is that these requirements add some serious time/RAM requirements when performed on the SQLite backend.
cc @plantton