Open mattijsdp opened 1 year ago
:tada: Welcome to PyMC! :tada: We're really excited to have your input into the project! :sparkling_heart:
If you haven't done so already, please make sure you check out our Contributing Guidelines and Code of Conduct.
You are passing a list. I think it will work fine if you pass a numpy array with the right dtype?
@ricardoV94 so yes and no. Using .values
to convert the DataFrame to a numpy array does indeed work. But if I pass a list of lists of ints straight into pm.Data
it also works (pm.Data("cats", [[0,1],[1,2]])
). It seems only passing a pandas DataFrame doesn't work? That's a bit weird, no?
Thanks for the quick response!
@mattijsdp you're right. It seems to be a problem in the call to convert_observed_data
by pm.Data
:
Your example falls into the last else
branch. The whole logic is a bit over-involved, because pt.as_tensor
is just as happy to take a pandas Dataframe. I am not sure pm.Data
should be calling this helper at all, since not all uses of it are related to observed data.
@ricardoV94 I'm not sure whether the helper should be called at all but if it should be called shouldn't line 137 there be if hasattr(ret, "dtype")
so use ret
and not data
. This would solve it as the lines above would have converted the DataFrame to a np array.
I am not sure we should be doing any casting to begin with, so maybe I would remove that whole bottom part of the function and see if any tests break
Describe the issue:
When calling
pm.Data
on a matrix (pd.DataFrame
) where all columns are of typeint
it still returns a tensor of typefloat64
. This is not true when passing apd.Series
. This correctly returns a vector of typeint
Reproduceable code example:
Error message:
PyMC version information:
Python version: Python 3.9.7 pymc.version: '5.7.1' pytensor.version: ''2.14.2' Installed using conda Running on linux
Context for the issue:
As a temporary solution one can cast the resulting Matrix but it seems to me that the way it currently works isn't desirable?