Closed rsbivand closed 10 months ago
Release of spdep
impending.
I apologize if I ended up making you feel restricted in how you maintain your package. This issue was fixed.
@lambdamoses The problem is with common use of testthat
with expect
failures not distinguishing between failures caused upstream and failures caused by changes in the software being tested. If expect
warned when an upstream change caused the failure but only error-failed when changes in the software being tested caused the failure, that would be more sensible.
expect failures not distinguishing between failures caused upstream and failures caused by changes in the software being tested.
That's precisely the point of the tests of interest here, since I need to make sure that the BiocNeighbors
knn and dnn methods give the same results as the spdep
versions which can also be chosen here, to make sure that the different implementations of knn and dnn won't affect the results (at least for exact methods for knn). In our field single cell genomics, it's common for different implementations of ostensibly the same methods to give different results. I need to check that the neighborhoods and edge weights are the same, though I don't care about the attributes since the parameters logged in those attributes are logged elsewhere in my package.
You can consider using BiocNeighbors
methods for knn and dnn in spdep
since some of them are much faster than dbscan
for larger datasets. Furthermore, in spdep
, the distance based weighting is very slow for larger datasets because the distances from dbscan
are not saved, so st_distance
needs to be used to re-compute the distances. In my package, I saved the distances from BiocNeighbors
, drastically improving performance.
Ok, so maybe don't test the attributes that do not matter for your cases. Testing the whole object is limiting how upstream can advance S3 objects, so ignore_attr=TRUE
is more sensible that running under the assumption that upstream object attribute names never change.
Thanks for the suggestions, but I probably will not implement them, as handling distances on the sphere is much more important, and distance-based neighbours are only partly sensible in spatial statistics, where a Markov-like contiguity make more sense (touching polygons, triagulation, etc.). Often the points used for knn/dnn are changes of support.
I see. It makes sense to use st_distance
when you need spherical distances then.
Yes, in fact using the s2
package using Google's s2geometry library for a spherical spatial index. We've found that since the neighbour objects are computed seldom and used frequently, other places are more rewarding when looking for efficiencies. I'll look at BiocNeighbors for planar geometries, but not straight away.
I can PR if you're interested. I'm also looking into ways to speed up Moran's I and so on for large number of sites (like sometimes hundreds of thousands but more often thousands to tens of thousands) and features, which are more common in our field. If I develop a much faster implementation I might also PR if you're interested.
Yes, I'd be interested. I have looked at re-representing spatial weights as a class (or classes) built on sparse matrices from the Matrix package, but spdep::lag.listw
is now in C and objects are row-major but Matrix works best for column-major objects, so coercion from row-major to column-major is an extra cost for asymmetric neighbours (like knn). I've also looked at igraph
without concluding. Some of this is described in https://cran.r-project.org/web/packages/spatialreg/vignettes/nb_igraph.html.
With regard to Moran's I, I've always been concerned about what it measures (and I've known about it since Cliff & Ord 1969, and found printing errors in the 1973 book). I think I've settled somewhere like the descriptions in ch 14-15 (and especially the exercises) in https://r-spatial.org/book/, based on McMillen (2003) and the note in Schabenberger & Gotway (2005), that is that global Moran detects many mis-specifications, not just plain spatial autocorrelation (as does the semi-variogram, or point processes where the underlying model is mis-specified). I have very little grasp of the use of these measures in bioinformatics, I'm afraid, I should do something about that.
@lambdamoses https://github.com/r-spatial/spdep/commit/e159de922c61713529a4075b0dfc2966eb8f9ad6 adds an attribute to
listw
objects to record thezero.policy
setting at their creation. Your test at https://github.com/pachterlab/SpatialFeatureExperiment/blob/e4948c8c8e7c5caa0710aeceecac9997d9d2c060/tests/testthat/test-graph_wrappers.R#L81-L96 could not exclude any new attributes, but probably should have specified which to include rather than failing because of upstream changes. Please do not restrict my freedom to maintain my package in this way, change the test to either drop all attributes, or only include those that you know you want to test. Check log: 00check.log, test failure: testthat.Rout.zip