hmsc-r / HMSC

GNU General Public License v3.0
102 stars 37 forks source link

Simplistic patterns in species association plots #74

Closed rburner closed 3 years ago

rburner commented 3 years ago

Hi HMSC team,

I had a question about the patterns I typically observe in species association scores. In the plots below, I’ve ordered species by their occupancy and plotted association score with >95% support. (More common species have more associations, which is what I typically see.)

One observation I commonly make, which is the focus of my question, is that most species that have any highly supported associations have associations with [almost] all other species that have any associations at all. This essentially means that association scores between a given species and all other species (i.e. the scores in a single row, or single column, in the association matrix) do not appear to be independent of each other. This is most apparent in the second plot below, but is also somewhat true in the first.

This doesn’t seem likely to reflect a biological reality, but seems more likely to be an artefact of something about the model structure. Maybe the latent variables that are used to estimate each species’ associations are few enough that the estimates are in this case gross oversimplifications that result in these simple patterns?

My main question then is whether I should 'believe' these association scores, in which case it could be meaningful for me to e.g. try to test whether differences in trait values between the two members of given species pair is related to their probability of having an association? Or is it (as it appears to me) clear from the plots below that these particular models at least are resulting in association score value estimates that I shouldn't trust?

Thanks!!

Ryan

cor1 cor2

ovaskain commented 3 years ago

The values in the association matrix are not (and cannot be) independent of each other, as the matrix needs to be positive definite. This reflects (partially) biology as well, not just model structure: if A and B have high positive association, and B and C as well, then necessarily A and C must be positively associated. On top of this, the latent variable structure "looks for" correlated associations. If you have 100 species, you have ca. 5000 associations. If you wish to get very accurate estimates of each of those, you should have a huge amount of data. The latent variable structure tries to get you those associations as accurately as possible with limited amount of data. Can you trust that the associations are real rather than noise or something forced by the statistical model? Test this e.g. by conditional cross-validation, see e.g. the book for more details.

O2

On 11.12.2020 15.40, Ryan Burner wrote:

Hi HMSC team,

I had a question about the patterns I typically observe in species association scores. In the plots below, I’ve ordered species by their occupancy and plotted association score with >95% support. (More common species have more associations, which is what I typically see.)

One observation I commonly make, which is the focus of my question, is that most species that have any highly supported associations have associations with [almost] all other species that have any associations at all. This essentially means that association scores between a given species and all other species (i.e. the scores in a single row, or single column, in the association matrix) do not appear to be independent of each other. This is most apparent in the second plot below, but is also somewhat true in the first.

This doesn’t seem likely to reflect a biological reality, but seems more likely to be an artefact of something about the model structure. Maybe the latent variables that are used to estimate each species’ associations are few enough that the estimates are in this case gross oversimplifications that result in these simple patterns?

My main question then is whether I should 'believe' these association scores, in which case it could be meaningful for me to e.g. try to test whether differences in trait values between the two members of given species pair is related to their probability of having an association? Or is it (as it appears to me) clear from the plots below that these particular models at least are resulting in association score value estimates that I shouldn't trust?

Thanks!!

Ryan

cor1 https://user-images.githubusercontent.com/36172156/101908695-c2061100-3bbc-11eb-8c24-220257a54a20.jpg cor2 https://user-images.githubusercontent.com/36172156/101908706-c5010180-3bbc-11eb-98df-4ef9e7c872a9.jpg

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hmsc-r/HMSC/issues/74, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEIYMZWRYZNJ6UTONZEVL6LSUIOLLANCNFSM4UWUWR3A.

rburner commented 3 years ago

Otso,

Thank you - ok, good point about the positive definite issue. I will take a look at using conditional cross-validation too.

My plan was to use a logistic regression test whether the difference in e.g. body size between each pair of species was a good predictor of whether they had a positive association (with identity of 'species A' in each pair as a random effect to account for the fact that some species have many more associations than others, which is partly due to differences in abundance). But maybe that analysis is overly precise for the estimates.

In Abrego et al. J. of Ecology 2017, you all test the effect of traits on associations by fitting a poisson model to the number of assocations each species has, based on that species trait values. But I was hoping to find a way to also consider the similarity in traits between each pair of species. Does my logistic regression approach to do that seem reasonable?

Thanks,

Ryan

ovaskain commented 3 years ago

My point of view it makes sense to make a regression model between two matrices, e.g. association matrix and matrix of trait similarity. But when doing so, you should test for significance/statistical support by permutation, as otherwise you neglect the dependencies between the matrix elements and will most likely get significant results for the wrong reasons.

Otso

On 11.12.2020 16.00, Ryan Burner wrote:

Otso,

Thank you - ok, good point about the positive definite issue. I will take a look at using conditional cross-validation too.

My plan was to use a logistic regression test whether the difference in e.g. body size between each pair of species was a good predictor of whether they had a positive association (with identity of 'species A' in each pair as a random effect to account for the fact that some species have many more associations than others, which is partly due to differences in abundance). But maybe that analysis is overly precise for the estimates.

In Abrego et al. J. of Ecology 2017, you all test the effect of traits on associations by fitting a poisson model to the number of assocations each species has, based on that species trait values. But I was hoping to find a way to also consider the similarity in traits between each pair of species. Does my logistic regression approach to do that seem reasonable?

Thanks,

Ryan

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hmsc-r/HMSC/issues/74#issuecomment-743207956, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEIYMZSUXFMPMSNCMKOMWCDSUIQZTANCNFSM4UWUWR3A.

rburner commented 3 years ago

Otso,

Great, I'll do some premutations of the matrices and see how my results compare to those!

Thanks again,

Ryan