multimeric / PandasSchema

A validation library for Pandas data frames using user-friendly schemas
https://multimeric.github.io/PandasSchema/
GNU General Public License v3.0
189 stars 35 forks source link

Custom error messages for each row, based on which part of a validation failed. #50

Open Abhisek1994Roy opened 3 years ago

Abhisek1994Roy commented 3 years ago

I tried to write a custom validator for dates, where we check if it is of a format among n number of formats- if that fails, we return an error- "date passed not among formats....". If that passes but it is lesser than a given date, then the error message will be- "date passed is less than.....". Wanted to know if this is possible in any way, or whether I have to break it into two validations.

Is it possible to edit the error message for a specific validation for every row of the data frame that it runs on? `class CustomDateFormatValidation(_SeriesValidation):

def __init__(self, date_format: list, nullable: bool = False, comparisons: list = None, **kwargs):
    self._default_message = ""
    self.date_format = date_format
    self.nullable = nullable
    self.comparisons = comparisons
    self._default_message = None
        # ('does not match any of the date formats- "{}"'.format(', '.join(self.date_format)))
    super().__init__(**kwargs)

def valid_date(self, val):
    if self.nullable and val == 'nan':
        return True
    for d_format in self.date_format:
        try:
            date = datetime.datetime.strptime(val, d_format)
            if self.comparisons is None:
                return True
            else:
                error_message = []
                for comparison in self.comparisons:
                    date = date.strftime(comparison["format"])
                    if not comparison["comparator"](date, comparison["value"]):
                        error_message.append(comparison["message"])
                if error_message:
                    self.default_message = error_message
                    return False
                else:
                    return True
        except:
            pass
    return False

def validate(self, series: pd.Series) -> pd.Series:
    return series.astype(str).apply(self.valid_date)

@property
def default_message(self):
    return self._default_message

@default_message.setter
def default_message(self, value):
    if isinstance(value, list):
        self.__default_message = "& ".join(value)
    else:
        self.__default_message = value`

Currently I am getting the same error message for every row that this validation fails in.