multimeric / PandasSchema

A validation library for Pandas data frames using user-friendly schemas
https://multimeric.github.io/PandasSchema/
GNU General Public License v3.0
189 stars 35 forks source link

Validation Failure when Schema Contains Column with Empty List of Validations Objects #63

Open lguntde opened 3 years ago

lguntde commented 3 years ago

I built a schema based on the list of columns I knew my DataFrame would contain. An number of these don't require validation beyond a check that they exist in the DataFrame (example: column containing comment field of unspecified format). I built a schema wherein I specified these columns as follows:

schema = Schema([ ... , Column('name',[]), ... ])

I then ran schema.validate(df) and received the following error:

Exception has occurred: AttributeError 'str' object has no attribute 'get_errors'

This traces back to: File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.9/site-packages/pandas_schema/column.py", line 27, in return [error for validation in self.validations for error in validation.get_errors(series, self)]

Since the column instance in question has no validations to iterate over, it makes sense, that this would fail.

My suggestion would be to include a check in the code that simply returns [] if no validations are present.

multimeric commented 3 years ago

Seems like a reasonable request, I would accept a PR for this behaviour.

multimeric commented 3 years ago

Hmm on second thoughts the issue goes deeper than this, I think. If you had no validations this should return an empty list anyway. For example:

>>> [b for a in [] for b in a]
[]

However you must actually have a string or several strings in your validations list. Please look into that in your code and report it here.

ajithprabhakar commented 2 years ago

I am also getting a similar error, is there any plan for a fix for this issue? My use case is a dynamic validation

We have multiple saved schemas for different files, as the files are uploaded we create schema dynamically by adding columns based on the schema saved on DB and then run the validation. A CSV will be generated with all the validation errors and presented to the user.

It will be really great if you could fix this issue ASAP

Here is the stack trace

AttributeError Traceback (most recent call last) <[command-3981061697201222]()> in ----> 1 errors = schema.validate(sourceDf)

/databricks/python/lib/python3.7/site-packages/pandas_schema/schema.py in validate(self, df, columns) 84 # Iterate over each pair of schema columns and data frame series and run validations 85 for series, column in column_pairs: ---> 86 errors += column.validate(series) 87 88 return sorted(errors, key=lambda e: e.row)

/databricks/python/lib/python3.7/site-packages/pandas_schema/column.py in validate(self, series) 25 :return: An iterable of ValidationError instances generated by the validation 26 """ ---> 27 return [error for validation in self.validations for error in validation.get_errors(series, self)]

/databricks/python/lib/python3.7/site-packages/pandas_schema/column.py in (.0) 25 :return: An iterable of ValidationError instances generated by the validation 26 """ ---> 27 return [error for validation in self.validations for error in validation.get_errors(series, self)]