Closed caterinap closed 7 years ago
Hey Cat, I think a good way to solve this output problems is to focus on the original approaches from the non tree/intra functions. For example, in clade we are interested in checking if clade removal cause change different then a random removal of species. Thus in tree_clade we just want to do the same but considering thre variation on phylogeny or data (for intra_clade).
For the influ we want to check the most influential species considering the phy or data uncertainty. etc for the others.
What do you think?
Yes, ok, makes sense, in any case, people can still run the individual functions if they want the phylo/intra details. But the "considering the variation on phylogeny or data" is the tricky part. The simplest solution that I can think of is to give a mean+deviation over all trees (example of tree function). So for example, for the clade functions you would have mean.slope, se.slope, DFslope, se.DFslope... etc. Or, other possibility, in the summary we just show the means for all trees and we have the true parameter accounting for uncertainty in tree/data will be the %change. And we can still show the variation in the plots.
Still need to take a decision for clade_tree and clade_intra interaction: should we just plot a mean over all iterations or have intervals around the lines?
Not sure. We can leave it and discuss tomorrow I guess. The intervals might work, but I see a problem to calculate the randomization test between all simulations. We could just use the wider variation of the tree/intra simulations and continue with the standard approach of clade. But lets discuss this topic and decide the safest way to produce results.
Ok, the summary for clade interactions now gives the % of times where it is not random:
(this would be the example for the paper, it shows that the results are robust to variation in tree topology, but some of the trees change the intercept (for Cebidae))
I did not modify the clade interaction plots yet. I am actually not sure about what we decided. Plotting the 95% intervals around the slopes might be a bit messy.
Now I am focussing on fixing small stuff and checking/changing summaries for all interaction functions.
Super nice @caterinap!!!! This results is very interesting indeed! =) It strongly suggests some biological mechanism related to this clade! Super cool! I think it will make the perfect example! Regarding the plot, lest keep the mean only and later we check if it makes sense to add the intervals, agree?
Cannot agree more :)
Great, really nice Cat! I will look into help functions later today :-)
Dr. Gijsbert Werner Newton Fellow, Department of Zoology Junior Research Fellow, Balliol College University of Oxford
Balliol College Broad Street Oxford OX1 3BJ United Kingdom
On 26 Jul 2017, at 16:17, Caterina Penone notifications@github.com<mailto:notifications@github.com> wrote:
Cannot agree more :)
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/paternogbc/sensiPhy/issues/145#issuecomment-318085880, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGauyE_LpHhSXUoDfILLmoc-q6v8vXyaks5sR1iNgaJpZM4Ncfjp.
Cool, let me about an hour to fix few things and I'll merge the branch in master! (In case it's useful I put a interaction_functions table in the Drive to check what is done for which function.)
not so sure about the name "Non random (%)"... but I have no other idea.
Yeah, me neither, but I could not find better.. feel free to change anything there!
'Significant (%)'? Or 'Significant Change (%)'?
I have a slight preference for 'Significant (%)' but both are better than 'Non random (%)'!!
In samp
method we use % Significant
we could use the same or change both to Significant (%)
;)
Nice, so working on the paper example. What do you think about this type of visualization for the tree_clade interaction?
It shows the estimated slopes after removing the clade Cebidae (red line) across different tree iteration. the black points represent the slope estimates for the random removal of species (null distribution) for each tree.
It clearly shows that this clade is always lowering the slope much away from the expected difference based only on clade size. Thus, it demonstrates that this effect (first shown on clade_phylm, first paper example) still holds regardless of the phylogenetic hypothesis being considered. So there is something there!
That is why the Significant (%) = 100%! Because in all cases this clade removal deviates significantly from the null distribution.
I think it is very nice! To be a bit more self-explanatory I would change the x-axis label into "Phylogenetic tree" or "Tree" and add in the legend that the black dots are the random draws (something like "dot=random").
New version with 100 trees and 100 n.sim!
Super super cool!
Wow! This is so cool! Love it :-)
Dr. Gijsbert Werner Newton Fellow, Department of Zoology Junior Research Fellow, Balliol College University of Oxford
Balliol College Broad Street Oxford OX1 3BJ United Kingdom
On 31 Jul 2017, at 11:53, Caterina Penone notifications@github.com<mailto:notifications@github.com> wrote:
Super super cool!
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/paternogbc/sensiPhy/issues/145#issuecomment-319022193, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGauyCSR45KpNeSaVQ2FuRPChWUfqASLks5sTaQ2gaJpZM4Ncfjp.
Clade Cebidae: Differs from Full data (decrease slope) and differs from null expectation
Clade Cercopithecidae: Differs from Full data (increase slope) but dont differ from null expectation Clade Lemuridae Dont differs from FULL data nor null expectation
Hey guys,
I developed this other graph to include together with the previous one.
The idea here is to show the general change in slope between all clades due to clade removal and compare with the full data fit (*black line). This would be the lieft graph while the above graphs would go on the right panel.
PS: the different points represent runs with different trees and the big point in represent the mean between trees for each clade. The idea is to get a graphical sense of the change caused by clade removal. So we can clearly see that Cebidae and Callitrichidae reduces the estimate while Cercopithecidae increases.
The right pannel would be this updated version. Please let me know what you think of both ;)
Something like that
Maybe just leave red point on the first graph to match with the red (without clade)
We can also change the dregree of dispersion in the jitter points
I would for that:
To solve the problem when too many trees are analyzed, I decided to remove the x axis text when the number of trees are higher then 30.
Then the users can explor which trees are changing the slope by hand with the raw data. I am developing this type of exploration for the online tutorial.
Solution:
When trees are up to 30 we can keep the names:
Finally, what do you think about highlighting the null distribution points that fall above or below the red line (to make it very clear how unique is the clade removal compared to the null expectation).
Possible graph for the paper (all three together) Clade Cebidae: Differs from Full data (decrease slope) and differs from null expectation
Clade Cercopithecidae: Differs from Full data (increase slope) but dont differ from null expectation
Clade Lemuridae Dont differs from FULL data nor null expectation
Cool, very nice! I very much like the new figure you created :-)
Just so I understand correctly, these are two separate figures right? The top one for the Cebidae and the bottom for the Cerco’s.?
If it’s a four-panel figure, it might be a bit confusing that the (new) left figure is repeated twice?
Dr. Gijsbert Werner Newton Fellow, Department of Zoology Junior Research Fellow, Balliol College University of Oxford
Balliol College Broad Street Oxford OX1 3BJ United Kingdom
On 17 Aug 2017, at 02:01, Gustavo Paterno notifications@github.com<mailto:notifications@github.com> wrote:
Something like that
Cebidae
[image]https://user-images.githubusercontent.com/9639481/29391333-f819fa70-8339-11e7-97b2-c9603e15f3b6.png
[image]https://user-images.githubusercontent.com/9639481/29391512-2b19fc58-833b-11e7-971f-4ffdadabc366.png
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/paternogbc/sensiPhy/issues/145#issuecomment-322939957, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGauyBQaZKW9FyyqX2-7Pv9DRHZ9GY_Fks5sY5DTgaJpZM4Ncfjp.
What would be an even more visually clear signal is that the dot plots are all in a given colour (let’s say greyish), and then that only the one that is represented to the right in more detail across all the tree’s would be plotted in red.
I am not sure if this is feasible, but if it is, it would give a really powerful visual signal that the one we are depicting in detail to the right is the same one as the one to the left in the same colour.
If not possible, I guess I would stick with former (i.e. different colours for each species), so people will not think there is a correspondence in terms of the colours.
Dr. Gijsbert Werner Newton Fellow, Department of Zoology Junior Research Fellow, Balliol College University of Oxford
Balliol College Broad Street Oxford OX1 3BJ United Kingdom
On 17 Aug 2017, at 02:01, Gustavo Paterno notifications@github.com<mailto:notifications@github.com> wrote:
Maybe just leave red point on the first graph to match with the red (without clade) [image]https://user-images.githubusercontent.com/9639481/29391549-6fed6f22-833b-11e7-80f0-7bc724223b15.png
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/paternogbc/sensiPhy/issues/145#issuecomment-322940041, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGauyNT7RRDCmwsKk8Dbph9oA_rrq9VNks5sY5D5gaJpZM4Ncfjp.
Cool, I agree! I guess that in terms of the dispersion/jitter ideally we would still want to be able to detect different trees individually (but this will be challenging with many trees). Could we set the jitter distance dynamically, depending on the number of trees plotted?
Dr. Gijsbert Werner Newton Fellow, Department of Zoology Junior Research Fellow, Balliol College University of Oxford
Balliol College Broad Street Oxford OX1 3BJ United Kingdom
On 17 Aug 2017, at 03:17, Gustavo Paterno notifications@github.com<mailto:notifications@github.com> wrote:
To solve the problem when too many trees are analyzed, I decided to remove the x axis text when the number of trees are higher then 30.
[image]https://user-images.githubusercontent.com/9639481/29393204-cb4f37ec-8345-11e7-828d-64b12b7e6e27.png
Then the users can explor which trees are changing the slope by hand with the raw data. I am developing this type of exploration for the online tutorial.
Solution: [image]https://user-images.githubusercontent.com/9639481/29393210-dd96b3da-8345-11e7-8f56-8052ccea400b.png
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/paternogbc/sensiPhy/issues/145#issuecomment-322950343, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGauyNcHoFSYh129TQnB8c9rVyRvDf7zks5sY6K0gaJpZM4Ncfjp.
Yes, I like highlighting the replicates/dots that are above/below the null distribution, but worry a bit that they make the figure a bit chaotic and difficult to interpret. Perhaps it would help if they can also be mention in the legend (as something like ‘Replicates above/below null distribution’, because now it’s the only colour that is not mentioned.
Dr. Gijsbert Werner Newton Fellow, Department of Zoology Junior Research Fellow, Balliol College University of Oxford
Balliol College Broad Street Oxford OX1 3BJ United Kingdom
On 17 Aug 2017, at 04:20, Gustavo Paterno notifications@github.com<mailto:notifications@github.com> wrote:
Finally, what do you think about highlighting the null distribution points that fall above or below the red line (to make it very clear how unique is the clade removal compared to the null expectation).
Possible graph for the paper (all three together) Clade Cebidae: Differs from Full data (decrease slope) and differs from null expectation [image]https://user-images.githubusercontent.com/9639481/29394574-974b4504-834e-11e7-92ef-9334a4448ba7.png
Clade Cercopithecidae: Differs from Full data (increase slope) but dont differ from null expectation [image]https://user-images.githubusercontent.com/9639481/29394552-66438c96-834e-11e7-90d7-ce514fb55786.png
Clade Lemuridae Dont differs from FULL data nor null expectation [image]https://user-images.githubusercontent.com/9639481/29394544-5983c098-834e-11e7-8e62-a0ba766f4973.png
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/paternogbc/sensiPhy/issues/145#issuecomment-322958404, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGauyIvzXAGbNJfzdSEPeKBMeuNbWjxRks5sY7GIgaJpZM4Ncfjp.
Hey @gijsbertwerner, thanks for the feedback.
So lets keep it simple. the sensi_plot() for tree_clade will produce 2 graphs. One comparing all clades estimates with the full data estimate. This will give a general overview of each clade influence on estimate (left graph).
The second graph will focus on one clade, comparing the clade removal with the null distribution across all trees analyzed (right graph).
the comand:
sensi_plot(clade_tree,clade = "Cercopithecidae")
will produce this figure:
While the command sensi_plot(clade_tree,clade = "Cebidae")
I know this repeats the first graph for all clades, but the user can choose to print only the second graph
(e.g. sensi_plot(clade_tree, graphs = 2, clade = "Cebidae")
For the paper I think we have two options: 1: to use only the null ditribution graphs for Cercopithecidae, Cebidae and Lemuridae. Something like that:
2: to use the null distribution of these clades plus the clade comparisons (firrst graph). Something like that:
Of course we would still polish that on inkscape.
So let me know if you have any other suggestion regarding the function plots, then I can prepare a multiplot for the paper and we can discuss that graph in google docs ;)
Looks good to me!
Dr. Gijsbert Werner Newton Fellow, Department of Zoology Junior Research Fellow, Balliol College University of Oxford
Balliol College Broad Street Oxford OX1 3BJ United Kingdom
On 18 Aug 2017, at 07:39, Gustavo Paterno notifications@github.com<mailto:notifications@github.com> wrote:
Hey @gijsbertwernerhttps://github.com/gijsbertwerner, thanks for the feedback.
So lets keep it clean. =) What you think about that for the paper? [rplot pdf]https://user-images.githubusercontent.com/9639481/29445670-30c1050c-842b-11e7-973f-55fd49da0253.png
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/paternogbc/sensiPhy/issues/145#issuecomment-323264317, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGauyDs_vUNz0m1mKWQMG8twlB506pvXks5sZSN3gaJpZM4Ncfjp.
All super nice!!! I like the second option where you see all clades + 3 clades with the trees. I am not sure how to show that the red line "significantly" differs from the black one or the null distribution (we can guess it but not be sure of it). What about the option we discussed to add a short text (in the graph) giving the % of significant or something else? Maybe too messy..
Ok. I had split the graphs in two figures (all clades and the blue graphs). But Maybe it is better to use the option 2 approach. Regarding the red line, I can add the % of significant iterations on the top together with the other info. I guess it is ok.
For example:
Ah, sorry, I forgot to reply to the question of whether to go for option 1 or option 2 in the main text. I would certainly be in favour of option 2. I like how it combines everything in a single overview, and thing that the second example should ideally have only a single figure (with panels), like the first. Definitely go for two!
Yes, let's include the percentage of significant iterations above the plot, I like it, and I think will really help clarify for the user (and in the ms)!
Perfect!
Now interaction_clade_tree_phylm is functional but I still need to find a good way to summarize it. If we follow what happens in
tree
, we should give mean, CI_low and CI_high. But this leads to a long summary. Maybe we just give the standard error (or deviation) for each parameter? Still it is almost the double of the columns...For the plots I was thinking to keep the same as the clade ones but add confidence intervals due to phylo variation around lines in both the scatterplots and the histograms. The histograms would include data on all trees as well (and have intervals around lines). We could also add std.errors (vertical and horizontal) on each dot in the scatterplot (but maybe too messy).