Type conversion fixes in `VisibilityGraph`

fkuehlein commented 1 year ago

checked all of pyunicorn.timeseries for similar problems with the ADJ semantic type alias
resolves issue #163
TODO: implement test(s) using parametric test fixtures to cover similar type conversion issues elsewhere and in the future

ntfrgl commented 1 year ago

Thank you!

Before merging this PR, it would be good software engineering practice to add at least a single simple test for all known function calls that would have triggered the bug before it was fixed. The easiest way to do that would be to create 2 modified versions of test_timeseries.testVisibility(), using the same synthetic data generation procedure, but then to call vg.retarded_local_clustering() and vg.advanced_local_clustering(), respectively, and to check the type and shape of the result.

In subsequent work, this could then be refined to verify numerical properties of the result, to parametrise the test, to extend it to other similar functions etc.

fkuehlein commented 1 year ago

Thanks for the feedback, I added two such tests now. I hope they're about what you had in mind.

Also, I merged my changes regarding test_timeseries.testVisibility() into this branch now. Thanks for your comment, I do see your points:

_visibility(): The core of this function is a loop with a 1-bit accumulator, and should not be allocating an array and calling several NumPy methods on it.

I changed that function to match other functions that calculate the visibility. BUT:

testVisibility(): A test which essentially replicates the code under test is almost useless.

I had suspected that.. which makes testVisibilityHorizontal just as useless. Not only does it replicate _visibility(), but _visibility() just replicates single entrances of the adjacency matrix, which at that point has been fully calculated anyway when an instance of VisibilityGraph was created.

Therefore, if I'm not missing something, I'd argue to drop some functions once added in 08b1f11 completely. My suggestion:

drop the methods
- VisibilityGraph.visibility() (including _visibility()) and
- VisibilityGraph.visibility_horizontal(), as they are only used for testing, but could just call VisibilityGraph.adjacency[i, j] instead.
drop the methods
- VisibilityGraph.visibility_single() and
- VisibilityGraph.visibility_horizontal_single(), as they are not used at all, and could just call VisibilityGraph.adjacency[i, :] instead.
alternatively, one might keep them, but just make them return VisibilityGraph.adjacency[]

I would then just substitute test_timeseries.testVisibility() (and test_timeseries.testVisibilityHorizontal()) by a most basic version which just checks for type and shape of the adjacency matrix. This could serve as a new basis to subsequently be refined, together with testRetardedLocalClustering and testAdvancedLocalClustering.

What do you think? And you, @jdonges?

(EDIT: just noticed that testVisibility is still failing, don't understand why, but will fix that)

ntfrgl commented 1 year ago

I agree with your suggestions.

Comments on your most recent commits:

The way you treat v: int now in _visibility() is problematic. You might want to review the semantics of function calls in Python and Cython.
For diagnosing the failure of a test, it is usually more helpful to have assert A; assert B rather assert A and B.

fkuehlein commented 1 year ago

Thank you for the hints, I'm learning a lot here.

Due to my poor understanding of passing by reference in cython, vg.visibility() always returned False, which is bad. Although, testVisibility() was giving wrong visibility results too. Lucky enough, vg.adjacency() still returns what it is supposed to, so I guess we can just drop the other two for now.

I added a new commit, including my above suggestions and your hints. I kept the previous version of testVisibility() but commented it out to be able build on that lateron. Also, as we noted in the related issue #163, at some point one might want to change the dtype of vg.adjacency from np.int16/np.int32 to ADJ right away. But this would require checking the entire Network class and inheriting classes for possible conflicts and then removing all the to_cy(A, ADJ) type casts again.

ntfrgl commented 1 year ago

Great to hear.

The new tests should not call the method under test multiple times, unless they are checking some nondeterministic behaviour. And you might want to create a new issue for a more consistent representation of adjacency matrices, linking to the discussion in #163.

Otherwise, this PR and #163 should be good to go.

fkuehlein commented 12 months ago

Agreed. I changed the tests accordingly and will open a new issue for the adjacency matrices.

pik-copan / pyunicorn

Type conversion fixes in `VisibilityGraph` #183