Some pointers on first go

hi @orlamac - just having a look at this first go as requested.

First off it is great and it runs! A few things

Markdown: I found this helpful for a markdown cheat sheet. You can add hyperlinks for example to the cited papers.
Jupyter runs sequentially through the cells (I believe this is a criticism of Jupyter). I can tell you haven't done this by looking at cell outputs. .
When you clear output and rerun it will break as it did on mine. I have tried to troubleshoot this and it appears you may have deleted some of the required cells before pushing to Git.
I always clear output and rerun to ensure notebook runs from start to finish on my analyses. Making sure it can run from start to finish is definitely good practice. My method is probably a bit naive and @CarolineMorton may have a pointer on how this is best tested
As regards differences between df_lithium and df_nonGPlithium I haven't fully gotten to bottom of it but here are a few things I have noticed about df_nonGPlithium. Maybe @CarolineMorton can point out what I'm obvious missing -- ccgs.org_type='CCG' this is included so dataframe will only include "nonGPs" who are attached to a CCG e.g. out of hours. You should remove this to get mental health community teams etc.

When I run this as @orlamac has WITH bnf_tab AS ( SELECT DISTINCT chemical, chemical_code FROM ebmdatalab.hscic.bnf ) SELECT rx.month, rx.practice, rx.pct, SUBSTR(rx.bnf_code,1,9) AS chemical_code, chemical, sum(IF(rx.bnf_code LIKE "0402030K0%", items,0)) AS carbonate, sum(IF(rx.bnf_code LIKE "0402030P0%", items,0)) AS citrate, sum(items) AS total_lithium, sum(actual_cost) AS total_cost FROM hscic.normalised_prescribing_standard AS rx LEFT JOIN bnf_tab ON chemical_code =SUBSTR(rx.bnf_code,1,9) JOIN hscic.practices AS prac ON rx.practice = prac.code JOIN hscic.ccgs AS ccgs ON rx.pct=ccgs.code WHERE prac.setting != 4 AND (bnf_code LIKE "0402030K0%" OR ##carbonate bnf_code LIKE "0402030P0%") ##citrate AND ccgs.org_type='CCG' GROUP BY rx.month, rx.practice, rx.pct, chemical_code, chemical ORDER BY month

to produce df_nonGPlithium I get a csv with 806,012 rows which is definitely not what I expect. However when I ran it on BQ webinterface I get a csv with 5615 rows which is what I would expect. Any ideas?

orlamac / lithium-paper

Some pointers on first go #2