multimeric / PandasSchema

A validation library for Pandas data frames using user-friendly schemas
https://multimeric.github.io/PandasSchema/
GNU General Public License v3.0
189 stars 35 forks source link

DateFormatValidation should have a toggle for allowing nan values (empty cells in a csv) #47

Open Abhisek1994Roy opened 3 years ago

Abhisek1994Roy commented 3 years ago

I have created a custom function to solve this-

class CustomDateFormatValidation(_SeriesValidation):
    def __init__(self, date_format: str, nullable: bool = False, **kwargs):
        self.date_format = date_format
        self.nullable=nullable
        super().__init__(**kwargs)

    @property
    def default_message(self):
        return 'does not match the date format string "{}"'.format(self.date_format)

    def valid_date(self, val):
        if self.nullable and val == 'nan':
            return True
        try:
            datetime.datetime.strptime(val, self.date_format)
            return True
        except:
            return False

    def validate(self, series: pd.Series) -> pd.Series:
        return series.astype(str).apply(self.valid_date)

Wanted to know if I should make this change to the existing DateFormatValidation Function and give a pull?

multimeric commented 3 years ago

Can you please check if allow_empty helps, and if not, can you test using this PR? https://github.com/TMiguelT/PandasSchema/pull/44