Hill # NaN in genetic and trait results

isaacovercast commented 1 year ago

Run this:

## init_type may be bridge_island or oceanic_island
p <- roleParams(individuals_local = 100, individuals_meta = 10000,
                species_meta = 10, speciation_local = 0, 
                speciation_meta = 0.05, extinction_meta = 0.05, env_sigma = 0.5,
                trait_sigma=1, comp_sigma = 0.5, dispersal_prob = 0.01, mutation_rate = 1e-6,
                equilib_escape = 1, num_basepairs = 500, alpha = 10000,
                init_type = 'bridge_island', niter = 100000, niterTimestep = 10000)
model <- runRole(roleModel(p))

The this getSumStats(model) and some genetic diversity and trait Hill's for some timesteps will be 'NaN'. It is non-deterministic, and we assume it's because perhaps there is no variation in these values at a given timestep?

isaacovercast commented 1 year ago

Indeed, for the genetic diversity hill #s if there are any '0' in the vector of raw pi values within a given timestep then this fails because of log(0):

hill[q == 1] <- exp(sum(-n * log(n)))

'q' values other than 1 are perfectly happy with zeros in the pi vector.

In MESS I just ignore pi==0 values and I think this is justifiable. This is also how scipy.stats.entropy treats zeros.

Update: I pushed a fix for the genetic diversity Hill's, simply removing 0s from the vector.

isaacovercast commented 1 year ago

For trait hills, you get 'NaN' in the case where there is only 1 species present in the local community. If there's only 1 species present then in .hillDivTrait() the distance matrix between traits (dij <- as.matrix(dist(traits))) is a single value == 0, which eventually leads to a divide-by-zero 2 lines later.

diazrenata commented 1 year ago

After wrestling (philosophically) with whether it makes sense to try to hatch a trait diversity for when there is a single species, or whether to simply build in a catch to send single-species communities to evaluate to 0, I'm leaning towards just building in the catch sending single-species to evaluate to zero.

Considerations:

Functional diversity hill numbers are based on species-distance matrices, which just can't be evaluated for a single species.
In roleR, we do end up with trait variation in the absence of species variation (intraspecific variation) - but then that's summarized to a species mean anyway, and it seems a little out-of-scope to come up with a new version of trait diversity that looks both intra and inter specifically.

diazrenata commented 1 year ago

Amending: I think it should evaluate to 1, yes? Weighting by trait diversity is moot in this instance, so it should converge back to abundance (which evaluates to 1 anyway).

diazrenata commented 1 year ago

See b0ddd3f3339a12726d11bc59c04cdf9bb041e7b8. I feel like there's room for spirited discussion re: the interpretation of trait hill numbers in a single-species community, though! :P

isaacovercast commented 1 year ago

agreed. In the case of no trait variation it should be 1.

diazrenata commented 1 year ago

Getting sneakier, b1ba083f7e1e9326420eaccd581322718ce6d6a3. This one catches both edge cases: if there is only one species, or if there are multiple species all with the same trait. (Seems extreme but possible in special circumstances, like a super narrow optimum). In this instance, it effectively reverts to abundance hill numbers.

diazrenata commented 1 year ago

I could be convinced in this instance (no trait, but species, variation) it should be something else, though. 1 for that, too?

diazrenata commented 1 year ago

(hillR::hill_func just fails in this instance, btw)

isaacovercast commented 1 year ago

IDK, I think conceptuall if there is literally zero trait variation then I would say it should be 1. If you think about it from the perspective of # of 'effective species traits' then it would be 1.

ajrominger commented 1 year ago

This is a super hard one to figure out!

After perhaps too much effort, I agree that for "effective number of species" (i.e. weighted Hill) the answer should be 1, both from the perspective of 1 species, or multiple species but with no trait variation.

I went deep down a rabbit hole on this (nothing against rabbits!)

The short answer is that unweighted hill goes to 0 both for 1 species and multiple species with no trait variance. (Script below showing that).

But weighted hill is $(D{hill} / Q)^{1/2}$. And Q in the equation goes to 0 just as fast as $D{hill}$ (where $D_{hill}$ is unweighted hill). So again weighted Hill in the limit of 0 trait variation (either from 1 species or multiple spp with same trait) goes to 1. I'm not showing the math for Q going to 0, but it's fairly clear from Chao's equations

Here's the script

# a function to explore limits of trait hill number
# sets q = 2 w/o loss of generality
# n = vector of spp abundances
# trtDiag =  intra-specific trait difference (in reality always 0, but we allow
#            it to be != 0 to explore limit)
# trtOff  =  inter-specific trait differences (i.e. "off diagnal" entries in a
#            distance matrix)

hillLim <- function(n, trtDiag, trtOff) {
    p <- n / sum(n)
    dij <- matrix(trtOff, nrow = length(p), ncol = length(p))
    diag(dij) <- trtDiag

    Q <- as.vector(p %*% dij %*% p)
    a <- outer(p, p, '*') / Q

    return(sum(dij * a^2)^(1 / (1 - 2)))
}

# explore limit as inter-specific differences go to 0
hillLim(rep(1, 5), trtDiag = 0, trtOff = 1)
## 20

hillLim(rep(1, 5), trtDiag = 0, trtOff = 0.1)
## 2

hillLim(rep(1, 5), trtDiag = 0, trtOff = 0.01)
## 0.2

hillLim(rep(1, 5), trtDiag = 0, trtOff = 0.001)
## 0.02

So this says that as trait variation between species goes to 0, so too does Hill diversity go to 0. But again, this is for unweighted hill.

Now for a single species:

# explore limit as diagonal element approaches 0 when there is only 1 species
# (i.e. figure out what hill div should be for only 1 species)
hillLim(1, trtDiag = 0.1, trtOff = 1)
## 0.1

hillLim(1, trtDiag = 0.01, trtOff = 1)
## 0.01

hillLim(1, trtDiag = 0.001, trtOff = 1)
## 0.001

Here again, for a single species Hill diversity goes to 0.

But again these calculations are for unweighted Hill. For weighted Hill, I convinced myself the answer is indeed equal to 1 because D_hill and Q go to 0 at the same rate.

ajrominger commented 1 year ago

OH! Something important though that's coming out of this is: right now we're returning unweighted Hill numbers right? So is that what we want to do, or do we want to switch to returning weighted Hills?

ajrominger commented 1 year ago

Damn! I been working on this more and I was wrong about weighted trait Hill for the case no variation: weighted trait Hill for the case with no variation might equal weighted abundance-based Hill with the same SAD...still working on it, stay tuned, or if you all have it figured out, even better!

ajrominger commented 1 year ago

One thing for sure though we have to pick a column (attribute diversity or generalized hill):

isaacovercast commented 1 year ago

Generalized Hills, that's my vote. The interpretation is very natural.

On Thu, Jun 1, 2023 at 2:31 PM Andy Rominger @.***> wrote:

One thing for sure though we have to pick a column (attribute diversity or generalized hill): [image: image] https://user-images.githubusercontent.com/2481785/242692425-0791c9a8-9045-43a5-a48e-ee408441f2cf.png

— Reply to this email directly, view it on GitHub https://github.com/role-model/roleR/issues/121#issuecomment-1572581459, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNSXP3UGFHNXNT2V5RXHBLXJDNYDANCNFSM6AAAAAAYK2NKVA . You are receiving this because you authored the thread.Message ID: @.***>

diazrenata commented 1 year ago

I agree with Isaac about generalized hills.

ajrominger commented 1 year ago

also agree with both of you about using generalized hills!

so this means we need to revise the trait and phylo hill numbers to return the generalized value (currently they return the "attribute diversity" from Chao's table). the abundance hill function (and thus the gen div hill) returns the generalized value because the generalized and attribute div for abundance are equivalent

ajrominger commented 1 year ago

and my final thought on hill numbers when trait variation is absolutely 0 is that the generalized number should indeed be 1. if you all are good to go with that, let's go for it

diazrenata commented 1 year ago

Unless I'm mistaken, the trait hill number function is already returning generalized hills! e4b7a4701fa52e80697fa8a8a750de8fd9cbfa99 implements a check if the trait diversity is 0 (which would occur if species richness is 0, or if trait diversity is 0) and returns 1 for all q.

I'll work on phylo next!

ajrominger commented 1 year ago

right you are about generalized hills! don't know how i convinced myself otherwise!

role-model / roleR

Hill # NaN in genetic and trait results #121