pysal / segregation

Segregation Measurement, Inferential Statistics, and Decomposition Analysis
https://pysal.org/segregation/
BSD 3-Clause "New" or "Revised" License
111 stars 26 forks source link

Error message on Inference when running SingleValueTest #133

Closed MyrnaSastre closed 3 years ago

MyrnaSastre commented 5 years ago

I'm running into this error message when trying to use the Inference wrappers and the function SingleValueTest. Any suggestions? Thanks!

Screen Shot 2019-08-19 at 2 38 02 PM
renanxcortes commented 5 years ago

Ok, this is due to the fact that the input variables are float64 type and this approach expects an integer since it relies on simulations of a binomial distribution. This needs to be adjusted and we can work in a more informative message for the users.

MyrnaSastre commented 5 years ago

It works now. Yes. It needed two lines, 1)to take care of the na's first (in the case of my particular data), and then 2) convert from float to int.

Screen Shot 2019-08-20 at 12 53 01 AM

Thanks Renan for the suggestions...

renanxcortes commented 5 years ago

It works now. Yes. It needed two lines, 1)to take care of the na's first (in the case of my particular data), and then 2) convert from float to int.

Screen Shot 2019-08-20 at 12 53 01 AM

Thanks Renan for the suggestions...

No problem! So, whenever you have NAs in your dataset, you will get NAs as output (this was implemented in https://github.com/pysal/segregation/pull/131). In your case, you need to drop the NA's and convert the data type to integer (because you cannot convert to integer having NAs).

Nevertheless, I'll leave this issue open, since I think we can come up with a more informative error message for this inference wrapper.

knaaptime commented 5 years ago

we could do a simple check and recast with a warning here too. something like

if not (group_pop_var.dtype == int) & (total_pop_var.dtype == int):
    warn("Input data contains columns formatted as floating point. Recasting to integer")
    data[[group_pop_var, total_pop_var]] = data[[group_pop_var, total_pop_var]].astype(int)
knaaptime commented 5 years ago

as long as we warn, I dont see any reason to avoid float==>int conversion if the numpy operations require ints

renanxcortes commented 5 years ago

as long as we warn, I dont see any reason to avoid float==>int conversion if the numpy operations require ints

I tried this first with @myrnasastre, however, you cannot convert to integer a dataset that has a NaN because, NaN is a float type. If you try to run

import pandas as pd
data = pd.DataFrame.from_dict({'gru': [3.0, 2.0, 1.0, np.nan], 'tot': [30.0, 10.0, 5.0, 10.0]})
data['gru'] = data['gru'].astype(int)

You'll get an error message.

renanxcortes commented 5 years ago

Yep, but the NAs is a different thread and it is only related to this specific case, I think that the solution you suggested in https://github.com/pysal/segregation/issues/133#issuecomment-523098638 is good.

knaaptime commented 5 years ago

well, but #131` would tell you if you have Nans (which presumably you'd handle separately), then, once there are no nans, we could do this conversion with a warning

alternatively, we could try and be clever with something more like data[data.notnull()].astype(int)

knaaptime commented 5 years ago

yeah sounds like we're on the same page

knaaptime commented 3 years ago

i think this is resolved now?