pysal / pointpats

Planar Point Pattern Analysis in PySAL
https://pysal.org/pointpats/
BSD 3-Clause "New" or "Revised" License
83 stars 26 forks source link

Simulation Envelopes, Low & High #57

Closed vpapaioannou closed 3 years ago

vpapaioannou commented 4 years ago

The way the low and high functions are calculated, https://github.com/pysal/pointpats/blob/master/pointpats/distance_statistics.py#L627, it appears to be confusing since it depends on the pct parameter whereas it shouldn't. If pct=1 then the low and high envelopes are the same as can be seen below.

realizations = PoissonPointProcess( spp.window, spp.n, 10, asPP = True) kenv = Kenv( spp, intervals = 20, realizations = realizations, pct = 1) kenv.plot()

image

If pct = 0.05 having the same realizations, then,

image

If there is an intention to incorporate some sort of p - value in the graphs, then from what I know this p - value is determined exclusively by the amount realizations and nothing else. Finally, I would suggest to have an assertion about pct valid values.

Thanks,

Vasilis

sjsrey commented 4 years ago

@vpapaioannou if you could add a PR that illustrates they type of functionality you are interested in, that would be very helpful.

vpapaioannou commented 4 years ago

What does PR mean?

To my understanding, the envelopes should be independent from the level of significance. That means, given the existing code, that the low envelope should be always row 0 and the high envelope should be always the last row of numpy array res. They should be the same if only one realization is provided.

sjsrey commented 4 years ago

What does PR mean?

Pull Request

To my understanding, the envelopes should be independent from the level of significance. That means, given the existing code, that the low envelope should be always row 0 and the high envelope should be always the last row of numpy array res. They should be the same if only one realization is provided.

The simulation envelopes are dependent on the specification of the significance level.

vpapaioannou commented 4 years ago

Do you have a reference about "The simulation envelopes are dependent on the specification of the significance level."?

sjsrey commented 4 years ago

Do you have a reference about "The simulation envelopes are dependent on the specification of the significance level."?

https://esajournals.onlinelibrary.wiley.com/doi/abs/10.1890/13-2042.1

vpapaioannou commented 4 years ago

Reading the paper, I understand your rationale. To my mind though, there are two different but very close concepts. The first one is that of the Lower / Upper Bound (LB / UB) and the second one is that of a K function at level α. In the first case, the LB should be the row 0 in array res in the code, and the UB should be the last row. For the second case, the K functions should be reported together with the level α so that to be distinguished from the actual LB and UB K functions. Also, instead of using the LB / UB labels I would just use K_α.

Finally, in line https://github.com/pysal/pointpats/blob/master/pointpats/distance_statistics.py#L626, the length nres should be nres = len( res) + 1 where the 1 comes from the current point process under testing that is considered as another instance of a CSR process. As I understand it, realizations variable doesn't include the observed one.

ljwolf commented 4 years ago

Would clarifying the documentation help? It seems your interpretation of envelope is the extrema, while we're using it to mean the (1- α)% extrema. It is unlikely that we will change the nomenclature in this code, but a re-vamp of the code is about to be merged that is oriented to let users work with simulations directly.

ljwolf commented 4 years ago

where the 1 comes from the current point process under testing

Edit: Of course, I should also say thank you for reviewing this in depth and giving feedback!!

I believe this is done correctly in the new implementation.

vpapaioannou commented 4 years ago

Yes, updating the documentation it would be enough. If pct is small enough then you can get the extrema, otherwise you can get the (1 - α)% extrema. If you do so, I would suggest instead of pct to use alpha and this parameter would be self explanatory (keep the doc string though). Finally, I would like to bring attention to np.int() function that is used, that floors a number e.g. 2.7 -> 2.

As of nres that I said above, since Python starts from 0, I think the code is correct. However, if pct = 0 then an IndexError is raised. Maybe there you need to take care of this in the new implementation.

You are welcome, thanks for implementing all this functionality.

ljwolf commented 3 years ago

This should be addressed in the new implementation