pachterlab / SpatialFeatureExperiment

Extension of SpatialExperiment with sf
https://pachterlab.github.io/SpatialFeatureExperiment
Artistic License 2.0
36 stars 7 forks source link

Forthcoming changes in spdep break tests #17

Closed rsbivand closed 10 months ago

rsbivand commented 10 months ago

@lambdamoses https://github.com/r-spatial/spdep/commit/e159de922c61713529a4075b0dfc2966eb8f9ad6 adds an attribute to listw objects to record the zero.policy setting at their creation. Your test at https://github.com/pachterlab/SpatialFeatureExperiment/blob/e4948c8c8e7c5caa0710aeceecac9997d9d2c060/tests/testthat/test-graph_wrappers.R#L81-L96 could not exclude any new attributes, but probably should have specified which to include rather than failing because of upstream changes. Please do not restrict my freedom to maintain my package in this way, change the test to either drop all attributes, or only include those that you know you want to test. Check log: 00check.log, test failure: testthat.Rout.zip

rsbivand commented 10 months ago

Release of spdep impending.

lambdamoses commented 10 months ago

I apologize if I ended up making you feel restricted in how you maintain your package. This issue was fixed.

rsbivand commented 10 months ago

@lambdamoses The problem is with common use of testthat with expect failures not distinguishing between failures caused upstream and failures caused by changes in the software being tested. If expect warned when an upstream change caused the failure but only error-failed when changes in the software being tested caused the failure, that would be more sensible.

lambdamoses commented 10 months ago

expect failures not distinguishing between failures caused upstream and failures caused by changes in the software being tested.

That's precisely the point of the tests of interest here, since I need to make sure that the BiocNeighbors knn and dnn methods give the same results as the spdep versions which can also be chosen here, to make sure that the different implementations of knn and dnn won't affect the results (at least for exact methods for knn). In our field single cell genomics, it's common for different implementations of ostensibly the same methods to give different results. I need to check that the neighborhoods and edge weights are the same, though I don't care about the attributes since the parameters logged in those attributes are logged elsewhere in my package.

You can consider using BiocNeighbors methods for knn and dnn in spdep since some of them are much faster than dbscan for larger datasets. Furthermore, in spdep, the distance based weighting is very slow for larger datasets because the distances from dbscan are not saved, so st_distance needs to be used to re-compute the distances. In my package, I saved the distances from BiocNeighbors, drastically improving performance.

rsbivand commented 10 months ago

Ok, so maybe don't test the attributes that do not matter for your cases. Testing the whole object is limiting how upstream can advance S3 objects, so ignore_attr=TRUE is more sensible that running under the assumption that upstream object attribute names never change.

Thanks for the suggestions, but I probably will not implement them, as handling distances on the sphere is much more important, and distance-based neighbours are only partly sensible in spatial statistics, where a Markov-like contiguity make more sense (touching polygons, triagulation, etc.). Often the points used for knn/dnn are changes of support.

lambdamoses commented 10 months ago

I see. It makes sense to use st_distance when you need spherical distances then.

rsbivand commented 10 months ago

Yes, in fact using the s2 package using Google's s2geometry library for a spherical spatial index. We've found that since the neighbour objects are computed seldom and used frequently, other places are more rewarding when looking for efficiencies. I'll look at BiocNeighbors for planar geometries, but not straight away.

lambdamoses commented 10 months ago

I can PR if you're interested. I'm also looking into ways to speed up Moran's I and so on for large number of sites (like sometimes hundreds of thousands but more often thousands to tens of thousands) and features, which are more common in our field. If I develop a much faster implementation I might also PR if you're interested.

rsbivand commented 10 months ago

Yes, I'd be interested. I have looked at re-representing spatial weights as a class (or classes) built on sparse matrices from the Matrix package, but spdep::lag.listw is now in C and objects are row-major but Matrix works best for column-major objects, so coercion from row-major to column-major is an extra cost for asymmetric neighbours (like knn). I've also looked at igraph without concluding. Some of this is described in https://cran.r-project.org/web/packages/spatialreg/vignettes/nb_igraph.html.

With regard to Moran's I, I've always been concerned about what it measures (and I've known about it since Cliff & Ord 1969, and found printing errors in the 1973 book). I think I've settled somewhere like the descriptions in ch 14-15 (and especially the exercises) in https://r-spatial.org/book/, based on McMillen (2003) and the note in Schabenberger & Gotway (2005), that is that global Moran detects many mis-specifications, not just plain spatial autocorrelation (as does the semi-variogram, or point processes where the underlying model is mis-specified). I have very little grasp of the use of these measures in bioinformatics, I'm afraid, I should do something about that.