opensafely / tpp-sql-notebook

2 stars 0 forks source link

Think about / experiment with SQL exports #43

Open sebbacon opened 4 years ago

sebbacon commented 4 years ago

We need a way to get data from SQL queries to disk in a format that can be consumed by our analytic code.

One option would be to use xp-cmdshell -- something like

EXEC xp_cmdshell 'SQLCMD -S . -d OPENCorona -Q "SELECT * FROM Patient" -s "," -o "O:\data.csv"';

But we don't know SQL Server at DataLab so please let us know the best way of doing this!

We'll want to run these manually and also on a schedule (e.g. weekly following database updates).

We should verify how long it takes to run and dump a typical query resultset (per #42), and how large the resultset is on disk, and how much disk will be available.

evansd commented 4 years ago

Are we not using pyodbc for this?

sebbacon commented 4 years ago

Sorry, I wrote up in a hurry without thinking it through properly - yes ideally we would, I suppose I'm just aware that pushing huge amounts of data through pyodbc might be a bit... wrinkly...

But you're right, we should assume for now that using pyodbc will work.

However if @chris-tpp has any quick insights about bulk data dumps in SQL Server it might be useful to have in mind.

Also good to confirm how much disk space we'll have - we'll need a certain amount permanently, and a certain amount as scratch