multimeric / PandasSchema

A validation library for Pandas data frames using user-friendly schemas
https://multimeric.github.io/PandasSchema/
GNU General Public License v3.0
189 stars 35 forks source link

column mismatch wrong exception raised #53

Open Calosha opened 3 years ago

Calosha commented 3 years ago

lines 59 to 64 in schema.py. code checks if file columns are subset of self.get_column_names() but in Exception printing difference between columns and self.columns so it is always shows that all Columns are different

if set(columns).issubset(self.get_column_names()):
                columns_to_pair = [column for column in self.columns if column.name in columns]
            else:
                raise PanSchArgumentError(
                    'Columns {} passed in are not part of the schema'.format(set(columns).difference(self.columns))
                )
enricorotundo commented 3 years ago

Probably related to the issue above. In my case schema.validate(test_data) always returns Invalid number of columns. The schema specifies 21, but the data frame has 22 even thought test_data actually has 21 columns.

multimeric commented 3 years ago

Hi, thanks for the report. In the interests of time could either of you provide a reproducible example please? Preferably pure Python code.

Calosha commented 3 years ago

Probably related to the issue above. In my case schema.validate(test_data) always returns Invalid number of columns. The schema specifies 21, but the data frame has 22 even thought test_data actually has 21 columns.

yes that exactly what is the issue. I will write a sample code and will post it later today