MAINT: added separate column validation

reflectivity / orsopy

http://www.reflectometry.org/orsopy/

MIT License

0 stars 8 forks source link

MAINT: added separate column validation #127

Open andyfaff opened 9 months ago

andyfaff commented 9 months ago

Added a separate validation for column properties. Eventually this validation should simplify the schema creation.

Instead of raising a jsonschema.exceptions.ValidationError I decided to raise a ValueError - the validation of the column properties is not being done by jsonschema.

Should there be extra checks?

aglavic commented 9 months ago

Do we check anywhere, that the shape of column data is equal? If not, is this mendatory?

andyfaff commented 9 months ago

The number of columns needs to be congruent with the data. I would say that the number of data points should be the same for all columns.

For an ORB file a column has the potential to be multidimensional. For an ORT file each column is expected to have ndim=1. So for a concatenated datapoint,column you would expect a shape of (ndata, ncol). A question is how to hold it internally, as a tuple/list of columns, or as a 2D array. I believe we were aiming for the former, as it would make holding ORB data (i.e. column ndim>=1) more easily?

But yes, easy to add the check on load.

bmaranville commented 9 months ago

In the __post_init__ for OrsoDataset, we're checking that the number of columns in the data matches the number of columns defined in the header: https://github.com/reflectivity/orsopy/blob/main/orsopy/fileio/orso.py#L156 But we're not checking the first dimension of the columns, which should match.

What about this type of check?

    def __post_init__(self):
        if len(self.data) != len(self.info.columns):
            raise ValueError("Data has to have the same number of columns as header")
        column_lengths = set(len(c) for c in self.data)
        if len(column_lengths) > 1:
            raise ValueError("Columns must all have the same length in first dimension")