orlamac / lithium-paper

MIT License
0 stars 0 forks source link

Some pointers on first go #2

Open brianmackenna opened 4 years ago

brianmackenna commented 4 years ago

hi @orlamac - just having a look at this first go as requested.

First off it is great and it runs! A few things

When I run this as @orlamac has WITH bnf_tab AS ( SELECT DISTINCT chemical, chemical_code FROM ebmdatalab.hscic.bnf ) SELECT rx.month, rx.practice, rx.pct, SUBSTR(rx.bnf_code,1,9) AS chemical_code, chemical, sum(IF(rx.bnf_code LIKE "0402030K0%", items,0)) AS carbonate, sum(IF(rx.bnf_code LIKE "0402030P0%", items,0)) AS citrate, sum(items) AS total_lithium, sum(actual_cost) AS total_cost FROM hscic.normalised_prescribing_standard AS rx LEFT JOIN bnf_tab ON chemical_code =SUBSTR(rx.bnf_code,1,9) JOIN hscic.practices AS prac ON rx.practice = prac.code JOIN hscic.ccgs AS ccgs ON rx.pct=ccgs.code WHERE prac.setting != 4 AND (bnf_code LIKE "0402030K0%" OR ##carbonate bnf_code LIKE "0402030P0%") ##citrate AND ccgs.org_type='CCG' GROUP BY rx.month, rx.practice, rx.pct, chemical_code, chemical ORDER BY month

to produce df_nonGPlithium I get a csv with 806,012 rows which is definitely not what I expect. However when I ran it on BQ webinterface I get a csv with 5615 rows which is what I would expect. Any ideas?

orlamac commented 4 years ago

Thank you Brian and Helen for helping me sort the df_nonGPlithium issue. I removed the CCG lines, which clearly was important, but also I had misnamed the sql command in my final line - should have read df_nonGPlithium = bq.cached_read(sql3, csv_path='nonGPlithium.csv') That part is now working.