Open krlmlr opened 7 years ago
@krlmlr Can the state the purpose of the function precisely? Do we intent to do split-apply-combine here? What does callback mean?(Sorry, I am a non native speaker)
Split-apply-combine is one scenario that this style would support. Another is early termination (looking for a needle in a haystack--once you find it, you can stop fetching). Another is extract-transform-load where the data is larger than memory (or any other operation on the data that is performed for its side effect).
All of these can be performed today by dbSendQuery and dbFetch but a callback style API is harder to make mistakes (primarily I'm thinking of leaked connections) and can also be a common pattern that we offer not only in DBI but any source of row oriented data in R.
All of these can be performed today by dbSendQuery and dbFetch
... then it may as well live in another package?
can also be a common pattern
... then it should perhaps live in another package?
If we have a generic code that defines a data source as an R6 class (such as https://github.com/krlmlr/pumpr), we can add a wrapper for DBI connections there. Would that work?
Do we have a good way to process data piecemeal by now? Is there anything that needs to be done in DBI?
Hoping to be pardoned for barging in the discussion, memory swap is a true killer when coming down to processing remote large datasets. So, a function that should be quite handy would be dbSuggestN()
, which retrieves the current available physical memory (which may, of course, float) and the record size in server (perhapes taking into account some network overhead) and suggests a reasonable value for (which fits in, say, 80% of the current free memory), in the hope that the chances of incurring in memory swaps reduce significantly.
@zyxdef: Interesting idea in the context of chunked processing.
I now think this is out of scope for DBI.
This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.
The goal is to support fetching and processing data piecemeal with a callback. Development started in r-dbi/DBI#111, but perhaps the interface and the requirements should be specified here first.
CC @bborgesr @jcheng5 @jimhester @hadley.