Open jorainer opened 3 years ago
Indeed, I suppose using the more complex data structure comes at a cost when the simpler data.frame
would work too. Adding columns in a loop might simply not be the way ahead here. If possible, cbind
ing two DataFrame
would be the sensible option here:
> DF10 <- DataFrame(x1 = 1:100, x2 = 1, x3 = 1, x4 = 1, x5 = 1, x6 = 1, x7 = 1, x8 = 1, x9 = 1, x10 = 1)
> microbenchmark(cbind(DF, x = 5), cbind(DF, DF), cbind(DF, DF10))
Unit: microseconds
expr min lq mean median uq max neval
cbind(DF, x = 5) 1106.832 1119.0675 1157.4790 1131.755 1160.222 2144.252 100
cbind(DF, DF) 841.286 847.5270 876.2291 853.081 871.633 1913.417 100
cbind(DF, DF10) 867.546 873.4325 900.7079 879.722 906.423 1891.465 100
Not sure if that helps though.
bioc-devel has been very helpful in such situations.
I also have performance issues here, so following this.
Depending on your application, there's also joinSpectraData()
that I use regularly, albeit not yet with very large data sets.
yep, I'm using that to join the quant MS data with the ID MS data - its very good but still slow
cbind()
where you have taken care of matching/subsetting the data?Faster than cbind definitely. I'll do some profiling
Using a
DataFrame
in most backends to hold the data brings also some performance losses. Subsetting is not ideal but adding or replacing columns is even worse:$
on aDataFrame
is very slow,cbind
is already better but nothing beats thedata.frame
. This becomes a real bottleneck if we're e.g. adding columns in a loop, so we should check if there is a better way to add or replace data in aDataFrame
.pinging @lgatto @sgibb - maybe you have already a solution for this?