sandialabs / pyttb

Python Tensor Toolbox
https://pyttb.readthedocs.io
BSD 2-Clause "Simplified" License
26 stars 13 forks source link

11 import datasptensor check dimensions on import #207

Closed DeepBlockDeepak closed 1 year ago

DeepBlockDeepak commented 1 year ago

Tensor data with invalid indices relative to shape, in file sptensor2.tns:

sptensor
3 
3 3 1 
3
1 1 1 1
1 3 2 22
2 2 2 3

New Behavior when loading faulty tensor data:

$ python
>>> import pyttb as ttb
>>> S = ttb.import_data('sptensor2.tns')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jmeds/code/pyttb/pyttb/import_data.py", line 52, in import_data
    return ttb.sptensor.from_data(subs, vals, shape)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jmeds/code/pyttb/pyttb/sptensor.py", line 167, in from_data
    raise ValueError(f"Index out of bounds: {idx} for dimension: {i}")
ValueError: Index out of bounds: 1 for dimension: 2

:books: Documentation preview :books:: https://pyttb--207.org.readthedocs.build/en/207/

ntjohnson1 commented 1 year ago

Can we use one of our existing validation utilities?
https://github.com/sandialabs/pyttb/blob/9c150cb44f37136282b55106ddaf06cad9417641/pyttb/pyttb_utils.py#L688

Can we add a test to make sure we maintain this behavior? Also does this catch the situation where the shape provided is the wrong number of dimensions?

sptensor
3
3 3 1
2
1 1 1 1 1
2 2 2 2 2

Additional note, iterating in python is slow. The reason we use the invalidated from_data is for speed. from_aggregation will already do validation and adjustment to account for accident duplicates etc. Instead of looping over every sub (row) then index (col). You should be able to compare all the subs against the shape via a broadcast as a single numpy call then check if anything fails. Here is where we do something similar in from_aggregation to find the minimum acceptable shape to fit provided entries https://github.com/sandialabs/pyttb/blob/9c150cb44f37136282b55106ddaf06cad9417641/pyttb/sptensor.py#L318

dmdunla commented 1 year ago

This only checks that the shape and type of the subs index array: https://github.com/sandialabs/pyttb/blob/9c150cb44f37136282b55106ddaf06cad9417641/pyttb/pyttb_utils.py#L688

Nick's idea of using vectorized methods would be an improvement. Something like:

all(a <= b for a, b in zip(tuple(np.max(subs, axis=0) + 1), shape)

So, a combination of tt_subscheck and a comparison of the input shape and the shape derived from the subs array of indices would be a more complete check.

dmdunla commented 1 year ago

This seems to be broken. Any updates?

DeepBlockDeepak commented 1 year ago

Nothing so far but I am planning on taking a look soon.

dmdunla commented 1 year ago

Waiting to land #213 before this, as this is a small change to address error checking and will be impacted by the major refactor in #213.

dmdunla commented 1 year ago

@DeepBlockDeepak Can you update this now that the default sptensor constructor has moved to __init__?

dmdunla commented 1 year ago

This now looks like there is no change from main. I'm not sure what happened. Is this still an issue? I do not see the changes from @DeepBlockDeepak any longer.

ntjohnson1 commented 1 year ago

It doesn't look like this PR ever added a test for the expected behavior. I don't think I fixed anything in #213 so I assume this is still an issue (could check with the repro in the PR description), and the non-change is just a bad conflict resolution.

ntjohnson1 commented 1 year ago

@dmdunla @DeepBlockDeepak it didn't look like this was moving so I added the original test described above, confirmed it failed, adapted the code that reported the error, and pushed that to this branch. I snuck in a few minor cleanups I hit after merging main onto this branch.

dmdunla commented 1 year ago

@ntjohnson1 Is this ready for review? There was a lingering review request from @DeepBlockDeepak, so it wasn't clear to me if your changes are ready for the merge now.

ntjohnson1 commented 1 year ago

@dmdunla yes it is ready. There were no changes on the branch after the bad merge. I mostly just mentioned him since he opened the PR initially