pysal / esda

statistics and classes for exploratory spatial data analysis
https://pysal.org/esda
BSD 3-Clause "New" or "Revised" License
213 stars 55 forks source link

parallel_crand_ ValueError #146

Closed hh2110 closed 3 years ago

hh2110 commented 4 years ago

Hi everyone. First of all major kudos to the developers of this library - it is really cool!

I’m trying to take advantage of the parallelisation opportunity when calculating p values in the calculation of moran’s local index.

Here is my simple code

import esda
import numpy as np
import pysal as ps
import time
time0 = time.time()

size = 100;
patch_eg = np.random.rand(size,size)
w = ps.lib.weights.lat2W(size,size)
lm = esda.Moran_Local(patch_eg, w, transformation='r', permutations=10, n_jobs=8)
moran_local = np.reshape(lm.Is, patch_eg.shape); print(moran_local)
print('time take', time.time() - time0)

And the error I get when running this is


Traceback (most recent call last):
  File "try.py", line 11, in <module>
    lm = esda.Moran_Local(patch_eg, w, transformation='r', permutations=10, n_jobs=8)
  File "/home/hh774582/.conda/envs/fam13a-dev/lib/python3.7/site-packages/esda/moran.py", line 1017, in __init__
    seed=seed,
  File "/home/hh774582/.conda/envs/fam13a-dev/lib/python3.7/site-packages/esda/crand.py", line 184, in crand
    stat_func,
  File "/home/hh774582/.conda/envs/fam13a-dev/lib/python3.7/site-packages/esda/crand.py", line 467, in parallel_crand
    rlocals = np.hstack(rlocals).flatten()
  File "<__array_function__ internals>", line 6, in hstack
  File "/home/hh774582/.conda/envs/fam13a-dev/lib/python3.7/site-packages/numpy/core/shape_base.py", line 345, in hstack
    return _nx.concatenate(arrs, 1)
  File "<__array_function__ internals>", line 6, in concatenate
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 1251 and the array at index 7 has size 1243

I thought this is related to numba so I have updated to 0.50.1 but still get the above error. Any help is greatly appreciated. Thanks.

ljwolf commented 4 years ago

Thanks for the report! Can you please double check that import numba; numba.__version__ is correct? When developing this, we occasionally had issues with ensuring that the versions were correct.

Second, I note that you have only 10 permutations. How many observations is in patch_eg?

hh2110 commented 4 years ago

Hi, I have printed the numba version and it is 0.50.1 - patch_eg is 100*100 observations.

ljwolf commented 4 years ago

Great, thanks.

Have you reshaped patch_eg to be flat? like.... patch_eg.flatten()?

hh2110 commented 4 years ago

Yes, no luck there either. I have tried 2D arrays before with n_jobs=1 and it does work for me.

ljwolf commented 4 years ago

hmm, OK. Can you post the data?

hh2110 commented 4 years ago

Do you mean patch_eg? It is just np.random.rand(size,size) - so just a 2D matrix of random numbers. I am using size=20 for now

ljwolf commented 4 years ago

Ah, Yes, Sorry! I guess I missed that :smile:

I can definitely replicate this. It appears to be an issue I thought we addressed in development. I'll take a look & see if we can't get a bug fix.

hh2110 commented 4 years ago

Thank you!

On 24 Aug 2020, at 10:24, Levi John Wolf notifications@github.com wrote:

 Ah, Yes, Sorry! I guess I missed that 😄

I can definitely replicate this. It appears to be an issue I thought we addressed in development. I'll take a look & see if we can't get a bug fix.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

ljwolf commented 4 years ago

I think an immediate workaround is keep_simulations=False? The issue appears to be that the rlocals array is not concatenating correctly.

To explain a little, keep_simulations is a flag that governs whether the full set of conditional random realizations that we build during the permutations should be kept. If you're generally only interested in the p-values and local statistics, you don't actually need to keep all of the local statistics that arise during those realizations. So, keep_simulations=False avoids building the (potentially expensive) large array to store the n_observations, p_permutations array of values.

I think the concatenation logic in the parallel code is wrong when simulations are kept. The simulations can be of any (chunk_size, p_permutations), and numpy is not happy about that. Will need to dig further, but I don't get your issue when I add keep_simulations=False to the call to lm:

lm = esda.Moran_Local(patch_eg, w, transformation='r', permutations=10, n_jobs=8, keep_simulations=False)
hh2110 commented 4 years ago

Got it, thanks - it does work plus I don’t really need the simulations. Than you for your help.

On 24 Aug 2020, at 10:42, Levi John Wolf notifications@github.com wrote:

 I think an immediate workaround is keep_simulation=False? The issue appears to be that the rlocals array is not concatenating correctly.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

ljwolf commented 4 years ago

Could you give the output of your conda list?

hh2110 commented 4 years ago

yes sure, it is attached conda_list.txt

ljwolf commented 4 years ago

Thanks! That's all. I have a tentative fix pending. Thank you for the report!

sjsrey commented 3 years ago

Resolved with #147