multimeric / PandasSchema

A validation library for Pandas data frames using user-friendly schemas
https://multimeric.github.io/PandasSchema/
GNU General Public License v3.0
189 stars 35 forks source link

Dynamic Schemas Support #45

Closed christopherhastings closed 3 years ago

christopherhastings commented 3 years ago

I want to dynamically load schemas to evaluate, but haven't found a method to do so.

I have a Pandas Dataframe with the Column Name as the Index. In the row are the elements to evaluate.

I was able to use tolist() to convert those elements into a list that could be paired with InListValidation, however, upon putting it to a string, it won't run within Schema.

`val_string = ""
for x in schema_df.index:
    val_list = schema_df.loc[x].tolist()
    col_name = x
    val_string += F"Column('{col_name}', [InListValidation({val_list})]),"
validation_values = val_string` 

Then using it within Schema:

schema = Schema([
    validation_values

#    Column('Given Name', [LeadingWhitespaceValidation(), TrailingWhitespaceValidation()]),
#    Column('Family Name', [LeadingWhitespaceValidation(), TrailingWhitespaceValidation()]),
#    Column('Sex', [InListValidation(['Male', 'Female', 'Other'])]),
#    Column('Customer ID', [MatchesPatternValidation(r'\d{4}[A-Z]{4}')])

])

errors = schema.validate(df)

for error in errors:
    print(error)

My first thought was that perhaps it was still reading it as a string and I needed to turn it into code, so I tried both eval and exec, but the error remained the same.

Invalid number of columns. The schema specifies 1, but the data frame has 5

Thank you for advice, as this may be a user error. I also tried this method of using string with just a single column, and single line generation, in case Schema needed it to be on a new line and my string wasn't supporting that, but it told me that Schema can't be used on a Series.

christopherhastings commented 3 years ago

Found an error in my data, nothing to see here!

Thanks for the great work!