Closed joycepyang closed 5 years ago
We’ll take a look @joycepyang thanks!
@joycepyang @lzim
I'm not sure what you mean with regards to a. Can you elaborate or point to a specific example?
With regards to b, you can use the function par(mfrow=c(nr, nc)) before your plots command. nr stands for number of row and nc is number of columns. If you have par(mfrow=c(2,3)) then it'll make a 2x3 grid for you make 6 graphs in 2 rows by 3 columns.
Hello @joycepyang and @saveth - we need to merge this issue #205 with outstanding issue #76
Specifically, we don't want to lose track of the time-based displays Next Steps #4, 4 and 5, in issue #76, which are still outstanding.
Thanks,
Lindsey
Thanks @lzim for the continued guidance on how best to keep track of everything on GitHub; I didn't think about issue merges.
@saveth Sorry that wasn't very specific. If you look at the printed plots, you can see that on the y-axis, there's a section that's completely dark b/c each sta6a is being labeled on the y-axis so it's all overlapping. Is there a way to not print the labels?
@joycepyang Here's two ways to remove it depending on the type of plot command used. If you're using base plot use yaxt='n' in the plot command. For instance, plot(1:10, yaxt = 'n'). If you're using ggplot,like the ones used in your script, then use element_blank() in the theme command. For instance, ggplot(data, aes(x,y)) + geom_point() + theme(axis.text.y=element_blank(), axis.ticks.y=element_blank())
Thanks @saveth for the suggestions about using element_blank! that definitely worked. I updated all of those in the plots. I also used the par(mfrow = c(1,2)) code to put them on the same line; it definitely worked for the histograms although not the plots; I'm not sure why.
One other thing I also ran into is that for two of the variables, I was unable to get the histogram to run due to the number of breaks:
#in CPT
tmh_mean_cpt$mean_xc <- tmh_mean_cpt$mean - mean(tmh_mean_cpt$mean)
#in CDW
tmh_mean_cdw$mean_xc <- tmh_mean_cdw$mean - mean(tmh_mean_cdw$mean)
par(mfrow=c(1, 2))
hist(tmh_mean_cpt$mean_xc, main = "TMH CPT", xlab = "Centered Mean", bins = 20)
hist(tmh_mean_cdw$mean_xc, main = "TMH CDW", xlab = "Centered Mean", bins = 20)
```Error in hist.default(tmh_mean_cpt$mean_xc, main = "TMH CPT", xlab = "Centered Mean") : invalid number of 'breaks'
This also occurred again in the mmencounter, groupencounter, CPT inital appointments .
I'll attach the knitted file here so you can see that as well.
Last point; after eliminating the values that appeared repeatedly without clear reason why (e.g., 2089 in tmh), some of the remaining variables had very few data points; especially CPT and PE initial appointments. It would be great to discuss this on our call next time @lzim as I'm not really sure what is happening
@joycepyang I think the Skype session after this post helped addressed all the technical issues you had with the code. Guess what remains is your questions for @lzim .
From original issue #76
Nest steps #4: Due to the longitudinal/observational focus of our primary analyses, we do need to understand whether these measures of central tendency in each data set are obscuring secular trends. Specifically, it is very likely that overall demand for services (as measured by encounters), and adoption of EBPsy (as measured by CPT and PE templates) is increasing
Next steps #5: To explore and report on this, we would need graphs over time that show the measure of central tendency for each quarter observed in the dataset, with box and whiskers spread of the distribution for that quarter
@saveth
Thanks so much for your work on the Shiny app! 💯
I think that the thing I wonder about the most is what I requested at the end of the quant workgroup meeting on Thursday - the possibility of box and whiskers plots with time on the x-axis, and variable on y-axis, so that for each time observation (year or month), we could see a distribution of the selected variable.
I explored a bit today, I'm going to look a little more, but this is top of mind! Let me know if you have questions
Thanks!
Lindsey
@saveth
To be clearer - Due to the high level of variability and the skew we have observed, it would be great to use the median 25th/75%ile box and whisker with outliers (e.g., > 95%ile or something) depicted as dots. Thanks!
@saveth Thinking about it even more, violin plots, and ridgeline plots, or even possibly letter-value plots are likely more useful for visualizing and understanding these data and key distributions.
I realized that the box plot I mentioned will likely look quite static as the statistical summaries stay the same, while the distributions are changing (prefer violin to box-plot for this). And, density plots on their own, are very difficult to see/interpret with the multiple nested observations we have (prefer ridgeline to box-plot for this). Finally, since this is a larger dataset we can pursue visualization that affords more precise information about our tails.
Violin plot with time of observation - year or month - on x-axis, and variable on the y-axis
This will be really nice for our non-normally distributed data!
A few other thoughts about stratifying the data to display these, however: - Another way that might be helpful to see these would be with the variable on the x-axis and the clinics on the y-axis grouped and ordered in a meaningful way (Note: It may also make just as much sense for the grouping variable to by on the x-axis and the variable on the y-axis. But, since it is most common for interpreting graphs of distributions for the variable to be the x-axis, I proposed the first idea).
Groupings might be clinics whose distributions fall into deciles for the variable, ranging from the lowest decile to the highest decile on that variable up the y-axis. That way, we'd be able to see and tease apart how the distribution looks for clinics that fall into each %ile. I think we want to see 10 groups (i.e., deciles).
Key for the ridgeline plots: we want to get the distributions for particular clinics (sta6a), i.e., use the average distribution of that variable over time for a given clinic, which clinics stratified and displayed in their deciles.
Faceting to display 10 stratified groups combined with other packages
install.packages("ggridges")
Or latest development version from GitHub https://github.com/clauswilke/ggridges:
library(devtools)
install_github("clauswilke/ggridges")
library(ggridges)
dat %>% mutate(group = reorder(decile, variable, median)) %>%
ggplot(aes(x = variable, y = decile, height=..density..)) +
geom_density_ridges(scale = 10)
by @hadley https://github.com/hadley/lvplot
NOTE: I considered this approach too, but I'm not sure that generating and visualizing the letter value summaries for these plots are as readily interpretable for most viewers as compared to the violin plots and ridgeline plots above so, I ruled this out for now.
CITATION: Heike Hofmann, Hadley Wickham & Karen Kafadar (2017) Letter-Value Plots: Boxplots for Large Data, Journal of Computational and Graphical Statistics, 26:3, 469-477, DOI: 10.1080/10618600.2017.1305277
Available at https://www.tandfonline.com/doi/abs/10.1080/10618600.2017.1305277?journalCode=ucgs20
# install.packages("devtools")
devtools::install_github("lvplot/hadley")
Hi Lindsey and Savet, I just uploaded new code named "plots". I am making this an issue rather than a pull request per my previous conversation with Lindsey about listing questions as issues rather than pull requests. (If it's better served in pull request form, please let me know and I can adjust for next time.)
I followed these steps from last week:
It would be great if you could review the code, I've also knitted it as html and have uploaded it to the lucid meeting for tomorrow.
Two questions I have that I'm not sure are important to answer or not (depending on which plots we decide are useful to keep or not): a) How do we remove the tick / hash marks in the plots that are over populated? I tried several different ways to mask it, including changing the breaks but that did not work.
b) I also tried to print the plots side by side which they were doing in the Rstudio but not in the html.