Open sebbacon opened 4 years ago
Are we not using pyodbc for this?
Sorry, I wrote up in a hurry without thinking it through properly - yes ideally we would, I suppose I'm just aware that pushing huge amounts of data through pyodbc
might be a bit... wrinkly...
But you're right, we should assume for now that using pyodbc
will work.
However if @chris-tpp has any quick insights about bulk data dumps in SQL Server it might be useful to have in mind.
Also good to confirm how much disk space we'll have - we'll need a certain amount permanently, and a certain amount as scratch
We need a way to get data from SQL queries to disk in a format that can be consumed by our analytic code.
One option would be to use xp-cmdshell -- something like
But we don't know SQL Server at DataLab so please let us know the best way of doing this!
We'll want to run these manually and also on a schedule (e.g. weekly following database updates).
We should verify how long it takes to run and dump a typical query resultset (per #42), and how large the resultset is on disk, and how much disk will be available.