pik-piam / magclass

R package | Data Class and Tools for Handling Spatial-Temporal Data
GNU Lesser General Public License v3.0
4 stars 24 forks source link

mbind() grows super-exponentially on magpie object with duplicated dimnames #157

Closed 0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q closed 1 year ago

0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q commented 1 year ago

Each call to mbind() in which the first argument is a magpie object with duplicated dimnames (also if it is the only argument), the count of all individual duplicated dimnames is squared in the result

x <- as.magpie(setNames(c(1, 2, 10, 20, 30), 
                        c('foo', 'foo', 'bar', 'bar', 'bar')))

as.data.frame(x)[,c('Data1', 'Value')]
  Data1 Value
1   foo     1
2   foo     2
3   bar    10
4   bar    20
5   bar    30
x <- mbind(x)

as.data.frame(x)[,c('Data1', 'Value')]
   Data1 Value
1    foo     1
2    foo     2
3    foo     1
4    foo     2
5    bar    10
6    bar    20
7    bar    30
8    bar    10
9    bar    20
10   bar    30
11   bar    10
12   bar    20
13   bar    30

which leads to memory use to explode

x <- mbind(lapply(1:3, function(x) { as.magpie(setNames(x, 'foo')) }))
cat('    count      size\n')
for (i in 0:4) {
    if (0 < i)
        x <- mbind(x)
    cat(sprintf('%i   %8i   %12i bytes\n',
                i, max(rle(sort(getNames(x)))[['lengths']]), object.size(x)))
}
    count      size
0          3           1384 bytes
1          9           1512 bytes
2         81           2312 bytes
3       6561          80072 bytes
4   43046721      516561992 bytes

and R crashing (see []) at some point that is unrelated to the point where the duplicates were introduced (usually four mbind() calls further on).

Since we can't have a check against introducing duplicates (#151), maybe we can have mbind() not doing this?