pysal / pointpats

Planar Point Pattern Analysis in PySAL
https://pysal.org/pointpats/
BSD 3-Clause "New" or "Revised" License
83 stars 27 forks source link

Question on Ripleys stats #66

Open giovp opened 4 years ago

giovp commented 4 years ago

Hi,

Just discovered this really cool package! I have a question regarding the implementation of Ripleys statistics. What I am interested in is computing e.g. Ripley's K for n subsets of points that are sampled from the same area. To do this, I am interested in computing the moving distance threshold across the full area, and not only restricting it on points subsets. For instance, with pointpats I would do this:

import numpy as np
from sklearn.metrics import pairwise_distances
import seaborn as sns

from pointpats import ripley
from astropy.stats import RipleysKEstimator

sample = np.random.uniform(low=5, high=10, size=(300, 2))
sub1 = sample[0:100,:]
sub2 = sample[100:300,:]

dist1 = pairwise_distances(sub1, metric="euclidean")
dist2 = pairwise_distances(sub2, metric="euclidean")

rip1 = ripley.k_function(sub1, distances=dist1, support=10,)
rip2 = ripley.k_function(sub2, distances=dist2, support=10,)

sns.lineplot(
    rip1[0],
    rip1[1],
)
sns.lineplot(
    rip2[0],
    rip2[1],
)

image

But as you can see the statistics is truncated for one subset. The astropy implementation seems to account for this, and allow to pass the full area. E.g.

Kest = RipleysKEstimator(area=25, x_max=10, y_max=10, x_min=5, y_min=5)
r = np.linspace(0, 2.5, 100)

sns.lineplot(
    r, 
    Kest(data=sub1, radii=r, mode='none')
)
sns.lineplot(
    r, 
    Kest(data=sub2, radii=r, mode='none')
)

image

I saw that in the pointpats implementation there is a "hull" argument that could be used for that? If not, is it maybe planned to allow such operation, or is it not good practice? (am really naive wrt spatial stats, sorry)

Also, I was wondering whether it's planned to implement the edge corrections features for Ripleys stats.

Thank you!

Giovanni

ljwolf commented 2 years ago

Hey, sorry! Just saw this & the attendant stuff in squidpy.

Awesome package! We're a bit overstretched across our libraries at the moment, so pointpats definitely sees a bit less attention than, say, esda or libpysal. That said, very interested in supporting/extending things to make them more useful for other packages. We recently added some dask functionality and are interested (in the long term) to better support raster/image data [1,2] with xarray/dask in our autocorrelation stats.

That said, let me know if there's anything that seems good to include going forward. Duplication definitely reduces our overall effectiveness, and it seems like Ripley stats are a very clear case where we (maybe) need an ecosystem-wide performant implementation for general point patterns.

On the specifics:

  1. Yes, the hull option can be used to increase the study area to an arbitrary extent. It's fine to do that if that's your focus.
  2. Yes, we plan to implement edge corrections for this. But, we have no specific plans/timelines for this at the moment.
giovp commented 2 years ago

@ljwolf thanks a lot for the reply. Indeed with Squidpy we are trying to provide useful spatial statistics for the spatial molecular data analysis community. We took great inspiration from libpysal and used it as dependency for a while. Eventually we decided to re-implement selected ones internally but we'd be very interested to interface via squidpy more of the libpysal library. We make use of AnnData https://anndata.readthedocs.io/en/latest/ as data format. Do you think it's something that you might be interesting in supporting in the future? I'm also currently busy with else but I'll try to keep in mind to reach out in the future, there is a lot of very interesting development in this community!