Open andyfaff opened 9 months ago
Do we check anywhere, that the shape of column data is equal? If not, is this mendatory?
The number of columns needs to be congruent with the data. I would say that the number of data points should be the same for all columns.
For an ORB file a column has the potential to be multidimensional. For an ORT file each column is expected to have ndim=1. So for a concatenated datapoint,column you would expect a shape of (ndata, ncol). A question is how to hold it internally, as a tuple/list of columns, or as a 2D array. I believe we were aiming for the former, as it would make holding ORB data (i.e. column ndim>=1) more easily?
But yes, easy to add the check on load.
In the __post_init__
for OrsoDataset
, we're checking that the number of columns in the data matches the number of columns defined in the header: https://github.com/reflectivity/orsopy/blob/main/orsopy/fileio/orso.py#L156
But we're not checking the first dimension of the columns, which should match.
What about this type of check?
def __post_init__(self):
if len(self.data) != len(self.info.columns):
raise ValueError("Data has to have the same number of columns as header")
column_lengths = set(len(c) for c in self.data)
if len(column_lengths) > 1:
raise ValueError("Columns must all have the same length in first dimension")
Added a separate validation for column properties. Eventually this validation should simplify the schema creation.
Instead of raising a
jsonschema.exceptions.ValidationError
I decided to raise aValueError
- the validation of the column properties is not being done byjsonschema
.Should there be extra checks?