Closed sthawinke closed 9 months ago
I'm surprised that you're finding this line takes a long time to compute. Thanks for drawing attention to this.
I have rewritten this line more efficiently in nncross.ppp
and also accelerated the code for diameter.owin
for the case of rectangles. Please try it in the development version spatstat.geom 3.2-8.001
.
I am reluctant to add an argument dmax
unless it is really necessary. Let's first see if the code is now fast enough for your needs.
Thanks for the update! It has indeed practically eliminated the computation time for boundingbox
and diameter
. Yet for repeated calls to nncross.ppp
on small point patterns, the as.rectangle
function still takes most of the time. I used the following benchmark:
library(profvis)
library(spatstat)
bdry <- list(x=c(0.1,0.3,0.7,0.4,0.2), y=c(0.1,0.1,0.5,0.7,0.3))
w <- owin(poly=bdry)
X = runifpoint(1e2, w)
Y = runifpoint(1e2, w)
profvis(replicate(nncross(X, Y, what = "dist"), n = 1e4))
This scenario is quite common in genomics data where many feature combinations are tested in the same point pattern. My code is still much faster if I can bypass the dmax calculation. I have tried to call the underlying C-function directly, but I am struggling to incorporate this cleanly in a package (see https://stackoverflow.com/questions/77881302/c-function-from-spatstat-geom-package-not-available-in-r-although-it-is-expo?noredirect=1#comment137299936_77881302). Do you think this is a viable alternative?
Do you have your own C-code in a package and want to call the spatstat C-code from your own C-code? Or are you only talking about calling the spatstat C-code as a function call from R? I think the former is difficult, but the latter should be doable. As I understand your post on StackOverflow the problem is that you need to use :::
to access the function which is not allowed in a (CRAN) package. Is that correct? I don't know right now if it is a viable solution for us to export the C-function.
Could you please try the code again in the new update spatstat.geom 3.2-8.002
?
If as.rectangle
is taking a substantial amount of time, then that is a big concern for the whole of spatstat
, which relies heavily on this low level function. So I would really like to get to the bottom of this. In the latest update to spatstat.geom
I have accelerated owin
and as.rectangle
and I hope that this will resolve the problem.
In any case I will investigate whether we can add the dmax
argument as you requested.
I've also added a dmax
argument to nncross.ppp
in spatstat.geom 3.2-8.004
as an experiment.
I can confirm that the new version of spatstat.geom
cuts computation time of the original example in half from ~4.2s to ~2.1s on my computer. Adding a dmax
-value cuts another ~.5s for a total time of 1.6s. Finally, sorting X
and Y
and using the sorting arguments saves almost another second for a final computation time of ~0.6s (not counting the time to do the sorting once).
I have accelerated the code further, by effectively setting dmax=sqrt(.Machine$double.xmax)
which is the largest possible distance computable in the C code. The function nncross.ppp
does not need an argument dmax
any more, so it has been removed. This should be faster than the versions described above. This is in spatstat.geom 3.2-8.006
I can confirm that the new code without an dmax
argument, is equally fast as a user-provided dmax
in the previous version, so I think the problem is solved as well as we can, and the issue should be closed. Are there any loose ends @sthawinke ?
I have run a quick benchmark on my machine with the code I gave above:
Cran version: 11.4 s Latest GitHub version: 1.7s Latest GitHub version with presorting 0.32s
Issue solved indeed, thanks for tackling it thoroughly
Hello,
I appreciate the fast implementation in C of the nncross.ppp function, but unfortunately the call
dmax <- diameter(boundingbox(as.rectangle(X), as.rectangle(Y)))
is consuming most of the time when calling this function repeatedly on the same point pattern for different features. If I could precalculate dmax that would really speed up things. Any chance you could adddmax = diameter(boundingbox(as.rectangle(X), as.rectangle(Y)))
as default argument to the header so I can override it?Many thanks,
Stijn