pysal / pointpats

Planar Point Pattern Analysis in PySAL
https://pysal.org/pointpats/
BSD 3-Clause "New" or "Revised" License
80 stars 26 forks source link

operands could not be broadcast together with shapes (16,) (18,) #100

Open SFashandi opened 1 year ago

SFashandi commented 1 year ago

Hi, Running ripley.j_test(points, support=20) give rise to a value error like below where other tests (G, F, etc.) works well:

ValueError: operands could not be broadcast together with shapes (16,) (18,)

Thanks for your consideration.

weikang9009 commented 1 year ago

@SFashandi Thank you for reporting the issue. Could you provide the data for points used in your example?

SFashandi commented 1 year ago

MEPs_2D_dots_v_01_01.zip

Hi, I attached to this comment 2D points database which I've implemented in caculating G, F, and J Test. please consider this case. thanks.

weikang9009 commented 1 year ago

@ljwolf there does seem to be an issue with setting support for the J function. In the notebook example, setting support to be 15 will give me the warning: UserWarning: requested 15 bins to evaluate the J function, but it reaches infinity at d=24.7366, meaning only 10 bins will be used to characterize the J function. observed_support, observed_statistic = stat_function(. Setting support to be 20 will sometimes give me the error message ValueError: operands could not be broadcast together with shapes (13,) (15,). Can you look into this when you have a chance? BTW, both ripley.py and distance_statistics.py are implementations of distance based stats that are numpy oriented, should we remove ripley.py (I ran into errors when using this module), update the notebook, and keep distance_statistics.py only?

ljwolf commented 1 year ago

Yes, only one needs to be kept, and you're right that distance_statistics is the correct one to keep.

On the shaping issue, I think it's related to undefined behaviour for some of the functions at the edges of their support. I will try to get this fixed ASAP!

weikang9009 commented 1 year ago

Thanks @ljwolf! @SFashandi you may want to replace from pointpats import ripley with from pointpats import distance_statistics as ds and access all the distance functions from ds instead of ripley. So instead of ripley.j_test(points, support=20), you will use ds.j_test(points, support=20). Also change the value passed to the parameter support to be smaller than 20, which should help avoid the error you are currently encountering. Our team will investigate the issue to provide a more satisfactory solution later on. Thank you for reporting this!

ljwolf commented 1 year ago

ok @weikang9009, I think it's sufficient for now to use truncate=False when running the statistical tests.

ljwolf commented 1 year ago

To be clear, the issue is that the g() and f() functions can hit their limiting/undefined values at different times. Truncating g() separately from f(), then, can mean your g() statistic vector is too short (or long) to compare to your f() statistic vector. Since j() is the quotient of f() and g(), that's a problem.

So, we must use the full-length g() or f(), then truncate (if requested) after the fact.

We truncate because it's annoying for the user to get back arrays that can't quickly be plotted. If you ask for a support of 20, but the last 2 are np.inf (which is a valid return value for the j() function), then your plots are all blank. Truncating keeps the visible range of the function clear.