Open brianmackenna opened 4 years ago
Thank you Brian and Helen for helping me sort the df_nonGPlithium issue. I removed the CCG lines, which clearly was important, but also I had misnamed the sql command in my final line - should have read df_nonGPlithium = bq.cached_read(sql3, csv_path='nonGPlithium.csv')
That part is now working.
hi @orlamac - just having a look at this first go as requested.
First off it is great and it runs! A few things
Markdown: I found this helpful for a markdown cheat sheet. You can add hyperlinks for example to the cited papers.
Jupyter runs sequentially through the cells (I believe this is a criticism of Jupyter). I can tell you haven't done this by looking at cell outputs. .
When you clear output and rerun it will break as it did on mine. I have tried to troubleshoot this and it appears you may have deleted some of the required cells before pushing to Git.
I always clear output and rerun to ensure notebook runs from start to finish on my analyses. Making sure it can run from start to finish is definitely good practice. My method is probably a bit naive and @CarolineMorton may have a pointer on how this is best tested
As regards differences between
df_lithium
anddf_nonGPlithium
I haven't fully gotten to bottom of it but here are a few things I have noticed aboutdf_nonGPlithium
. Maybe @CarolineMorton can point out what I'm obvious missing --ccgs.org_type='CCG'
this is included so dataframe will only include "nonGPs" who are attached to a CCG e.g. out of hours. You should remove this to get mental health community teams etc.When I run this as @orlamac has
WITH bnf_tab AS ( SELECT DISTINCT chemical, chemical_code FROM ebmdatalab.hscic.bnf ) SELECT rx.month, rx.practice, rx.pct, SUBSTR(rx.bnf_code,1,9) AS chemical_code, chemical, sum(IF(rx.bnf_code LIKE "0402030K0%", items,0)) AS carbonate, sum(IF(rx.bnf_code LIKE "0402030P0%", items,0)) AS citrate, sum(items) AS total_lithium, sum(actual_cost) AS total_cost FROM hscic.normalised_prescribing_standard AS rx LEFT JOIN bnf_tab ON chemical_code =SUBSTR(rx.bnf_code,1,9) JOIN hscic.practices AS prac ON rx.practice = prac.code JOIN hscic.ccgs AS ccgs ON rx.pct=ccgs.code WHERE prac.setting != 4 AND (bnf_code LIKE "0402030K0%" OR ##carbonate bnf_code LIKE "0402030P0%") ##citrate AND ccgs.org_type='CCG' GROUP BY rx.month, rx.practice, rx.pct, chemical_code, chemical ORDER BY month
to produce
df_nonGPlithium
I get a csv with 806,012 rows which is definitely not what I expect. However when I ran it on BQ webinterface I get a csv with 5615 rows which is what I would expect. Any ideas?