r-lib / rray

Simple Arrays
https://rray.r-lib.org
GNU General Public License v3.0
130 stars 12 forks source link

`str.rray()` is slow #267

Open eliocamp opened 2 years ago

eliocamp commented 2 years ago

Running str() on a rray seems to take much longer than the equivalent base array.

library(rray)
array <- array(rnorm(24*2*25*165*128), dim = c(128, 24, 2, 25, 165))
rray <- as_rray(array)   

system.time(
capture.output(str(array))
)
#    user  system elapsed 
#   0.001   0.000   0.001

system.time(
  capture.output(str(rray))
)
#    user  system elapsed 
#  71.805   0.259  72.158
eliocamp commented 2 years ago

I've dug a little and found that the bottleneck is here.

https://github.com/r-lib/rray/blob/467ed4bc80d1caeae4024224e511ba886dd7ca31/R/print.R#L74

The problem is that inline_list() is processing the whole object, which can take a while if it's large and is completely unnecessary, since at the end of the day most of the information will be truncated. Just changing this to

cat_line(inline_list(title, format(out[1:100]), width = width))

solves the issue, since it only needs to convert the 100 first elements, which is still overkill, but it's fine. As an added bonus, the formatting is actually more useful.

With the old code I'd get something like this:

#  rray[,24,2,165] [,24,2,165][128]  1.330147e+00, -8.757626e-02, -2.1630...

The scientific notation is not needed to display those first few numbers, but it's there because it is important to display some other number down the line. With the new code and the same rray I get

# rray[,24,2,165] [,24,2,165][128]  1.33014677, -0.08757626, -0.21630801...

Which is much friendlier.

The ideal approach would be to calculate how many numbers would be needed, but I think that just truncating to a somewhat large number is enough.

I can make a PR with this change if you'd like.