pysal / mgwr

Multiscale Geographically Weighted Regression (MGWR)
https://mgwr.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
345 stars 123 forks source link

'operands could not be broadcast together with shapes' when using 'pred_results.predictions' #152

Open X-Fan-Jack opened 3 months ago

X-Fan-Jack commented 3 months ago

I follow the tutorial in (https://pysal.org/notebooks/model/mgwr/GWR_prediction_example.html) using gaopandas and sample to split the test and train set.

gdf = data_geo.to_crs('EPSG:27700')
gdf_train = gdf.sample(frac=0.8, axis=0, random_state=RANDOM_SEED)
gdf_test = gdf[~gdf.index.isin(gdf_train.index)]

X_train = gdf_train.drop(['Y', 'geometry'], axis=1).values
y_train = gdf_train['Y'].values.reshape((-1,1))
u = gdf_train.geometry.x
v = gdf_train.geometry.y
coords_train = list(zip(u,v))
selector = Sel_BW(coords, y_train, X_train)
gwr_bw = selector.search()
print('GWR bandwidth =', gwr_bw)
model = GWR(coords_train, y_train, X_train, gwr_bw)
gwr_results = model.fit()

X_test = gdf_test.drop(['Y', 'geometry'], axis=1).values
y_test = gdf_test['Y'].values.reshape((-1,1))
u = gdf_test.geometry.x
v = gdf_test.geometry.y
coords_test = np.array(list(zip(u,v)))  # https://github.com/pysal/mgwr/issues/85
scale = gwr_results.scale
residuals = gwr_results.resid_response

pred_results = model.predict(coords_test, X_test, scale, residuals)

Currently, it works well. But when I want to print the prediction result.

pred_results.predictions

it shows the

[!CAUTION] ValueError: operands could not be broadcast together with shapes (201,106) (201,103)

How to fix it, I want to check the R2 of the predicted results.

X-Fan-Jack commented 3 months ago

Maybe this can help to figure out what is wrong. the total dataset has 1004 rows

I try to pass 105 independent variables, which means X_train shows (803, 105). After I use the model = GWR(coords_train, y_train, X_train, gwr_bw), I use model.X.shape to check the independent variables, and it changes to (803, 103).

I don't know why it misses 2 columns, and I think that is why they can not match with 106. model.P.shape (201, 106)

X-Fan-Jack commented 3 months ago

The data I use contains 3 columns with 0 values and they present some characteristics with other columns.

After I delete these columns, the code pred_results.predictions works well and it can return an array.

I assume that in the GWR, it will automatically delete some columns that only contain zero values. Is it correct? and will this lose some data features and lead to inaccurate results?

Thank the development team for providing us with the package!