igraph / rigraph

igraph R package
https://r.igraph.org
532 stars 200 forks source link

Use `as.numeric(seq_len(n))` in favor of `seq(1, n)` #1427

Open krlmlr opened 5 days ago

krlmlr commented 5 days ago

What happens, and what did you expect instead?

https://github.com/igraph/rigraph/pull/1330#discussion_r1665753951

We also need the by argument to ensure that the output is a numeric.

Perhaps add a function seq_numeric() that has a special case for zero-length output?

To reproduce

seq(by = 1) seems to be the fastest, look at the units. Maybe it's worth arguing with R core to support seq(start, start - by, by) without error and to return an empty vector in this case?

x <- seq(1, 1, 1)
class(x)
#> [1] "numeric"
x
#> [1] 1
seq(1, 0, 1)
#> Error in seq.default(1, 0, 1): wrong sign in 'by' argument

bench::mark(
  sum(seq(1, 100000, 1)),
  sum(as.numeric(rlang::seq2(1, 100000))),
  sum(as.numeric(seq_len(100000)))
)
#> # A tibble: 3 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                           <bch> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 sum(seq(1, 1e+05, 1))                628µs  791µs     1233.    1.91MB     46.8
#> 2 sum(as.numeric(rlang::seq2(1, 1e+05… 861ns  984ns   898993.        0B      0  
#> 3 sum(as.numeric(seq_len(1e+05)))      164ns  246ns  3139792.        0B    314.

Created on 2024-07-05 with reprex v2.1.0

System information

No response

maelle commented 4 days ago

I wasn't able to get to this today :cry:

clpippel commented 1 day ago

seq(1, vcount(graph)) does not work for empty vectors. Alternatives are: seq_along, seq_len.

library(bench)
n <- 1
graph <- make_ring(n)
bench:: mark(
  "seq"   = seq(1, vcount(graph)),
  "along" = seq_along(V(graph)),
  "len"   = seq_len(vcount(graph))
)

The above fails if n = 0. vcount() is faster then V(g).

krlmlr commented 1 day ago

Yeah, my seq_numeric() would special-case for the n == 0 edge case.

clpippel commented 1 day ago

What is the point of a special function seq_numeric()? Isn't it simpler to replace seq(1, x) with seq_len(x)?

clpippel commented 1 day ago

I also wonder: how bad is it if we omit the test index_is_natural_sequence()? How likely is it that the test does indeed pass?

krlmlr commented 1 day ago

seq(1, x) is much faster, as the benchmark shows.

Because the natural sequence is the default argument for many input functions, I suspect that this case is pretty frequent and worth optimizing for, but I haven't done any measurements.

szhorvat commented 1 day ago

Your benchmark seems to show that seq_len (246 ns) is much faster than seq (791 µs), exactly the opposite of what you say above @krlmlr. What am I missing?

krlmlr commented 1 day ago

🤦 🤦

You're right, of course. But the test isn't representative either:

bench::mark(
  sum(seq(1, 10000000, 1)),
  sum(as.numeric(rlang::seq2(1, 10000000))),
  sum(as.numeric(seq_len(10000000)))
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 3 × 6
#>   expression                           min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                       <bch:t> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 sum(seq(1, 1e+07, 1))             77.6ms  93.34ms      10.6     191MB     16.0
#> 2 sum(as.numeric(rlang::seq2(1, 1…   902ns   1.02µs  800320.         0B      0  
#> 3 sum(as.numeric(seq_len(1e+07)))    164ns 246.04ns 2861548.         0B      0

Created on 2024-07-08 with reprex v2.1.0

The second and third example probably uses Gauss's formula. Let's try var() :

bench::mark(
  var(seq(1, 100000, 1)),
  var(as.numeric(rlang::seq2(1, 100000))),
  var(as.numeric(seq_len(100000)))
)
#> # A tibble: 3 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                           <bch> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 var(seq(1, 1e+05, 1))                818µs  960µs      883.    1.92MB     32.0
#> 2 var(as.numeric(rlang::seq2(1, 1e+05… 434µs  496µs     2002.   781.3KB     26.9
#> 3 var(as.numeric(seq_len(1e+05)))      432µs  495µs     2016.   781.3KB     29.4

Created on 2024-07-08 with reprex v2.1.0

This is more realistic but also indicates that as.numeric(seq_len()) is faster. I stand corrected.

clpippel commented 1 day ago

A thought: avoid seq() in any case.

clpippel commented 1 day ago

I tried make_ring(1E5). Optimization saves ca. 50%, 3.72ms v.s. 6.42ms.