multimeric / PandasSchema

A validation library for Pandas data frames using user-friendly schemas
https://multimeric.github.io/PandasSchema/
GNU General Public License v3.0
189 stars 35 forks source link

IsDtypeValidation not accepting multiple dtypes for a DataFrame Column #61

Open pranshuag9 opened 3 years ago

pranshuag9 commented 3 years ago

I am trying to input int or float, in parameter dtype of IsDtypeValidation but it doesn't accept multiple types. As a general type, i tried with numbers.Number which is a general class for all numbers but it throws error: "The column column_name has a dtype of int64 which is not a subclass of the required type <class 'numbers.Number'>".

Similar error comes when i have assigned it to np.dtype(float) and values in column are only integer. Same for np.dtype('d'), np.dtype(np.inexact) n all.

I tried using np.inexact or numbers.Number because i thought np.issubdtype() would accept it as True. Is it so?

How can i take multiple dtypes in a column and validate it to True? like it can be float if value in column is float, it can be integer, if value is integer.

multimeric commented 3 years ago

Hmm this is a bug if it's exhibiting this behaviour. Can you post a reproducible example where a sub dtype isn't validating correctly?

pranshuag9 commented 3 years ago

thanks for quick reply. I tried again with dtype=np.number and it worked. since internally np.number is accepted by np.issubdtype(). There is another issue now. I see yellow warning(in PyCharm IDE) in dtype=np.number in IsDtypeValidation(), as variable annotation for dtype is specialized only to np.dtype which should be generalized to some super type.

multimeric commented 3 years ago

Can you show me the warning message?

pranshuag9 commented 3 years ago

yes

image image

This warning message is coming because dtype is specialized only to np.dtype here image

For a temporary fix, it can be Union[np.dtype, np.number], but i think there can be a generalized solution.

multimeric commented 3 years ago

Okay so it's a bug with the type annotation. I just need to find out what type annotation is used internally by numpy/pandas and use the same one, or else have a Union as you suggest.