11 import datasptensor check dimensions on import

sandialabs / pyttb

Python Tensor Toolbox

https://pyttb.readthedocs.io

BSD 2-Clause "Simplified" License

26 stars 13 forks source link

11 import datasptensor check dimensions on import #207

Closed DeepBlockDeepak closed 1 year ago

DeepBlockDeepak commented 1 year ago

Tensor data with invalid indices relative to shape, in file sptensor2.tns:

New Behavior when loading faulty tensor data:

$ python
>>> import pyttb as ttb
>>> S = ttb.import_data('sptensor2.tns')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jmeds/code/pyttb/pyttb/import_data.py", line 52, in import_data
    return ttb.sptensor.from_data(subs, vals, shape)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jmeds/code/pyttb/pyttb/sptensor.py", line 167, in from_data
    raise ValueError(f"Index out of bounds: {idx} for dimension: {i}")
ValueError: Index out of bounds: 1 for dimension: 2

:books: Documentation preview :books:: https://pyttb--207.org.readthedocs.build/en/207/

ntjohnson1 commented 1 year ago

Can we use one of our existing validation utilities?
https://github.com/sandialabs/pyttb/blob/9c150cb44f37136282b55106ddaf06cad9417641/pyttb/pyttb_utils.py#L688

Can we add a test to make sure we maintain this behavior? Also does this catch the situation where the shape provided is the wrong number of dimensions?

sptensor
3
3 3 1
2
1 1 1 1 1
2 2 2 2 2

Additional note, iterating in python is slow. The reason we use the invalidated from_data is for speed. from_aggregation will already do validation and adjustment to account for accident duplicates etc. Instead of looping over every sub (row) then index (col). You should be able to compare all the subs against the shape via a broadcast as a single numpy call then check if anything fails. Here is where we do something similar in from_aggregation to find the minimum acceptable shape to fit provided entries https://github.com/sandialabs/pyttb/blob/9c150cb44f37136282b55106ddaf06cad9417641/pyttb/sptensor.py#L318