Open choisy opened 2 years ago
I will also benchmark the function with and without return() lol. Currently reading the bench::mark()
you recommended.
Cool. Maybe it won't change much but it's always a good practice to remove the final return()
call of the functions (also makes the code lighter to read). As for the multiplications, if it doesn't change the computation time that much, then your writing would probably be better as (i) more concise and (ii) easier to read for a human. Good to see what the benchmarking says here.
You were absolutely right! Using return()
is a bit slower, and less repeated multiplications makes it faster for around 0.3-0.4 ms. Also learned new knowledge about garbage collection while doing this. For the multiplications, do you think it is much change?
This is the results when I run ode()
and display time per day (time_values <- seq(0, 20)
)
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
1 return 3.59ms 3.85ms 256. 190KB 11.2 114 5 446ms
2 no_return 3.48ms 3.73ms 263. 189KB 11.1 118 5 449ms
3 less_multiplications 3.18ms 3.44ms 284. 108KB 11.2 127 5 447ms
This is when I let it display time per 0.1 day (time_values <- seq(0, 20, 0.1)
)
expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
<bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
1 return 6.52ms 6.79ms 144. 234KB 11.6 62 5 430ms
2 no_return 6.37ms 6.83ms 144. 232KB 11.4 63 5 437ms
3 less_multiplications 5.87ms 6.48ms 152. 152KB 11.5 66 5 435ms
Wow! Impressive! I actually didn't think it would necessarily make too much difference but I was wrong! Yes, then, never forget that any function call takes time and that multiplications take time as well (more than additions, that's why sometimes it's advantageous to log-transform in order to transform multiplications into additions). Do I understand it correctly that garbage collecting is time consuming? Also, I'm not sure I understand what the second plot correspond to.
Yeah, when the memory reach some threshold, R automatically free up memory by deleting the "garbage" (the values stored in memory but no variables reference to them). First it delete the youngest generated values (level 0), after 20 times level-0 it will delete the two youngest generations (level 1), after 5 times level-1 it then delete all generations (level 2) [ref here]. We can use the gc()
function to make a level 2 garbage collection ourself if we need. I think this process is time consuming because R has to scan through the memory to find the garbage and free up them, but this is done automatically and the condition to trigger garbage collections depends on command line options --min-nsize
and --min-vsize
on the users' computer when they open R so we can't do much about it [ref here].
Both plots are generated by the bench
package that shows the time consuming of all iterations when it rerun a function. The first plot is when I set the time vector to c(1, 2, 3, ..., 20)
for the ode()
and the bench::mark()
rerun each functions for about 120 times, each time it cost around 3-4 ms. The second plot is when I set the time sequence vector as c(1, 1.1, 1.2,..., 19.9, 20)
for the ode()
in deSolve
to display output at 0.1 day interval and it costs double the time to run the ode()
compared to 1 day interval.
So, if you force garbage collection, it takes more time right? Yes, garbage collection is normally not something you should do yourself, contrary to lower level language. It's only in very rare cases where you may need to do it.
It takes twice as long as we could have expected it to take 10 times as long right?
So, if you force garbage collection, it takes more time right? Yes, garbage collection is normally not something you should do yourself, contrary to lower level language. It's only in very rare cases where you may need to do it.
Yeah, but when I worked with vaccine registry data on the cluster with Duc, sometimes we have to force garbage collection otherwise we don't have enough memory lol.
It takes twice as long as we could have expected it to take 10 times as long right?
Yes, really curious about this. They seems to have a very efficient algorithm, I'm reading about algorithm complexity and guess this is called an O(log n) time complexity? If we expect it to take 10 times as long it would be an O(n) time complexity algorithm.
Yes, I guess that big data is typically a case where you need to do garbage collection. Rest of the time (99% of the cases), you don't.
One thing to know in R is that memory usage depends on the size of the object and this may be responsible for nonlinearities of execution time as a function of complexity. Another option could be fixed incompressible running time, which will necessarily take a higher proportion of the running time as the running time decreases (another possible cause fo nonlinearity). What you say is also possible. But you don't necessarily have to dig into the intricacies on the algorithms. What's important is to empirically characterize it (whether it's n, n^x, x^n, log(n), etc...) This said, I have a cool and easy-to-read population science book on algorithms in case you are interested. Let me know.
Just thinking here: I'm wondering whether
would be faster than
(7 multiplications instead of 14 and multiplications are time consuming). Maybe worth doing a little benchmarking here, at least out of curiosity?