Closed ChristinaSchmidt1 closed 10 months ago
One additional note: When saving the generated plots, we need to figure out how to:
The functions are merged. The new function is named VizPlots and it has:
parameter: Graprh_Style = "Bar", # options: Bar, Box, Violin Superplot = NULL, # or a column name from the Experimental Design to be used for superplot
I used the above instead of Superplot = TRUE/FALSE because like this we do not need another parameter with the vector we want to use. Now the user has to input the name of a column they want to use from the Exp design. ie Superplot = "Biological_Replicates"
I still left the Output_plots = "Together", or "individual but I think to change this and make a parameter Individual_plots = TRUE/FALSE with default = FALSE. What do you think?
There is also the parameter Selected_Conditions, where the used can input a vector of Conditions they want to keep in the plot. All the other conditions are removed. ie Selected_Conditions = c("HK2", "786-M2A","786-M1A" )
and Selected_Comparisons which is a list of vector containing the named of the conditions we want to make a t-test. ie Selected_Comparisons = list(c("HK2","786-M2A"), c("HK2","786-M1A"))
The additonal note is yet to be implemented.
Amazing, nice job!
Some points: Superplot = NULL, # or a column name from the Experimental Design to be used for superplot
I think we should use the same syntax as for lollipop and volcano. So we could rename Experimental Design to Plot_SettingsFile and we can use Plot_SettingsInfo= c(SuperPlot="ColumnName_Plot_SettingsFile")
Individual_plots = TRUE/FALSE with default = FALSE. --> Yes thats a great idea! This refers to saving them in a scrollable PDF document or individual file correct? Here we can also look into using facetting (https://ggplot2.tidyverse.org/reference/facet_grid.html), so plots are on one sheet nicely ordered. This order could even be for pathways or cl;usters. So that metabolites of a pathway are printed on one sheet.
For Selected_Conditions and Selected_Comparisons we need a column Condition at the moment. I think we could add this as well to the Plot_SettingsInfo=c(Conditions ="ColumnName_Plot_SettingsFile").
For Selected_Comparisons, we should still do an anova multiple comparison test (at least if there are more than two conditions on the plot). But than just label the selected compariosns on the plot, yet the information would come from multiple comparison test. Also, if we go down this route we probably need to offer both parametric and non-parametric tests?
About the additional note on saving the figures: This is something we will need to do for all the plots. I started doing this for the heatmaps, but this is a different graph object (pheatmap object), whilst most other graphs will be ggplot. So the syntax will be a bit different I think. But I guess if we have figured it out for one its applicable for most of the plots. Moreover, we should retrun a plot object to the environment including all the plots. I have added this for example into the pre-processing function. In this way, someone can still add things or make changes using ggplot syntax in most cases. Lastly, we should add save_as=NULL, in which case the figures are not saved but only the list of plot objects returned.
Yes, I will add the PlotSettingsFile and Info. Initially, I tried to make it like this but since there was the Experimental_design as a parameter it didnt make sense to me to add also the PlotSettingInfo, but of course the Experimental_design will be renamed into PlotSettingFile and add the SuperPlot in the plotSettingsInfo.
The Selected_Conditions is a subset of the total Conditions. Therefore, it cannot be something like a column name in the Plot_SettingsInfo. But it could be a vector of condition names in the PlotSettingInfo like this: Plot_SettingsInfo=c(Selected_Conditions = c("HK2", "786-M2A","786-M1A" )). The same goes for Selected_Comparisons which should be a subset of the Conditions Selected. Now that I am thinking about it we could remove the Selected_Comparisons completely and always do t-tests or anova between the Selected_Conditions.
Regarding ANOVA and the parametric/non parametric tests you have a point but doesnt that go too far? I say this because in my head the idea was to make the plots and maybe add a t-test for some statistics on the side. On the other hand to check distributions and do parametric or non parametric tests is not that difficult since we already did this in the DMA.
About the additional Note, yeah I agree that if we manage to do this for one plot when we should be ok since most plots are ggplots. I tried some things yesterday but didnt manage a lot.
Sorry if it wasnt clear, I did not mean to change Selected_Conditions, but I meant to pass the col,umn name that includes the information of the conditions. In case sopmeone did not label it conditions, but samples or patients or tissue. Does this makes sense? For Selected_Comparisons, yes we could always do the test, but this would be relevant to decide what should be shown on the plot in terms of statistics - or is the result of the stats not on the plot?
Yes with all the tests thats a lot of work, we can log this as an enhancement issue for the future, but nothing we implement in the first package version. Btw. there is a nice shiny app with those bargraphs arrangements and stats, which I think does a great job. https://cancerandmetabolism.biomedcentral.com/articles/10.1186/s40170-020-00220-x
Additional note: Yeah I think this part will inevitably take up some time, but in the end makes a huge difference for usability.
For the function to work we must have a column named "Conditions" in the Experimental_design now renamed to PlotSettingsInfo. We could add a Conditions in the PlotSettingsInfo and make it ="Conditions" as a default, so the user can change it.
The Selected_Conditions selects only those conditions speficied to be on the plot. The Selected_Comparison makes and plots statistics only for those conditions specified in the Selected_Comparison.
My remark was to remove the Selected_Comparison and when we have Selected_Conditions = "Something" also do and plot the statistics between the "something". Now this does not happen. The user has to specifically select the comparisons thy want to do through Selected_Comparison.
Yes, I agree adding the different tests for the plot would be super cool, we can put this as an enhancment for the future. Also, the app looks very nice. I see you can select spesific dimentions for the plots. Maybe I will try to find their code and see how they do this.
Lets discuss the point about conditions later in person :)
Yeah I know the app is great - would be nice if you can find the source code. This one could also be helpful for this: https://www.cedricscherer.com/2019/08/05/a-ggplot2-tutorial-for-beautiful-plotting-in-r/#axes
Thanks for the great discussion, I think we have a good plan now :) Could you add the points here that you noted down on your to do list?
After todays meeting we decided to do (I feel something is wrong here. Please correct me were needed):
Now that I look at it, again i think the Selected_Comparison is not needed. I dont remember the reason we decided to keep it.
I think this is how is should be: Generally we will plot all the conditions and no stats. If someone wants to plot only some conditions then they select those conditions is Selected_Conditions. If they also want the stats then they could do (a new param) add_stats ==TRUE and stats would be on the plot. Is there are 2 Selected_Conditions then a t-test if more the ANOVA. Like this we dont need the Selected_Comparison. and a True or FALSE parameter is ieasier than the Selected_Comparison. Does this make sense? I know its different from what we said but I dont remember the reasoning for keeping the Selected_Comparison since we will add the ANOVA.
What we discussed was the following: If only 2 selected_Conditions, we can ignore Selected_Comparison as we can just add the results. Yet, if the user selected 3 or more conditions, we need to do a multiple comparison test (like anova). In this case we may not want to plot all the results as the plot could be super crowded, but only the once of interest based on Selected_Comparison (I think this point was the one that lead to confusion). Yet, what you describe makes sense. If we can fit all the stats nicely on the plot (?), we dont need the parameter Selected_Comparison since we always plot all comparions stats on the plot.
One note: Lets always plot the exact value, e.g. p=0.06 or 0.9. If the values become to small we can plot p=7E-15.
Here we need to:
Ok so it was not working indeed. There was a double comma somewhere :'( Anyway now its working like this.
VizSuperplot(Input_data = Intra_Preprocessed[,-c(1:3)], Input_SettingsFile = Intra_Preprocessed[,c(1:2)], Graprh_Style = "Box", # Bar, Box, Violin Superplot = NULL, OutputPlotName = "Box", Output_plots = "Individual", Selected_Conditions = NULL, # not added yet Selected_Comparisons = NULL, # not added yet Theme = theme_classic(), Save_as_Plot = "svg") # for together it always pdf
Note 1: There seems to be an issue with the error bars. I think that they use interquartile ranges from the median in ggplot instead of the mean. We encountered this before but I dont exactly remember.
Note 2 : Stats are not added yet
I just had a look at the function and some points I noticed:
You call the vizSuperplots like this
VizSuperplot(Input_data = Intra_Preprocessed[,-c(1:3, 30:182)], Input_SettingsFile = Intra_Preprocessed[,c(1:2)], Input_SettingsInfo = c(conditions="Conditions", superplot = "Biological_Replicates"), Graph_Style = "Box", # Bar, Box, Violin
OutputPlotName = "",
Individual_plots = TRUE,
Selected_Conditions = c("786-M1A", "786-O", "HK2"),
Selected_Comparisons = list(c(1,2), c(1,3), c(2,3)),
Theme = theme_classic(),
Save_as_Plot = "svg") # for together it always pdf
Now about Selected_Conditions . if NULL then all groups are plotted. If some are selected then only those are plotted with the same order as in the Selected_Conditions vector. The Selected_Comparisons. If NULL then no stats are added. if one pair is added then t.test, if more than 1 pairs are added then anova
Amazing, thanks for the update. Shall I start with the helper function for the plotting or are you still working on some of the points?
I did not double check if everything is working as it should with no problems. But yes, you can start on the helper function
Ok I had a look at the function and tested the different functionalities. A couple of points:
The above are the things I noticed when going trough the function thus far.
Done. ~The facet-grid is moved to the general function.
As discussed yesterday please combine these four functions into one:
Give the user the parameter GraphStyle= "Bar", "Box" or "Violin" Give the user the parameter Superplots = TRUE or FALSE (if TRUE the user needs to provide the column name they want to use to colour code the superplots for.