statistikat / STATcubeR

R interface for the STATcube REST API and data.statistik.gv.at
https://statistikat.github.io/STATcubeR/
GNU General Public License v2.0
14 stars 1 forks source link

special handling of national accounts cubes #14

Closed GregorDeCillia closed 2 years ago

GregorDeCillia commented 3 years ago

national accounts cubes such as sc_example("foreign_trade") do not provide values for total codes.

https://github.com/statistikat/STATcubeR/blob/8416f6325e2cb80098b2b19826cda6a5fed81ec2/inst/json_examples/foreign_trade.json#L2

Therefore, they should be aggregated directly in $tabulate() because otherwise the result would be a table filled with NAs in all measure columns.

sc_example("foreign_trade") %>%
  sc_table() %$%
  tabulate("Reference year")
# A STATcubeR tibble: 11 x 5
   `Reference year` `Import, number… `Import, value … `Export, number… `Export, value …
 * <date>                      <dbl>            <dbl>            <dbl>            <dbl>
 1 2008-01-01                     NA               NA               NA               NA
 2 2009-01-01                     NA               NA               NA               NA
 3 2010-01-01                     NA               NA               NA               NA
 4 2011-01-01                     NA               NA               NA               NA
 5 2012-01-01                     NA               NA               NA               NA
 6 2013-01-01                     NA               NA               NA               NA
 7 2014-01-01                     NA               NA               NA               NA
 8 2015-01-01                     NA               NA               NA               NA
 9 2016-01-01                     NA               NA               NA               NA
10 2017-01-01                     NA               NA               NA               NA
11 2018-01-01                     NA               NA               NA               NA

In one of our internal projects, we currently use the condition

"T" %in% table$annotation_legend$annotation

to determine wether a direct aggregation via rowsum() should be applied.

GregorDeCillia commented 2 years ago

Update: this problem also occurs in the following cube from the environmental accounts

x <- STATcubeR::sc_table_custom(
  db = 'str:database:deeehh02',
  measures = "str:statfn:deeehh02:F-DATA:F-EEGJ:SUM", 
  dimensions = c(
    "str:field:deeehh02:F-DATA:C-ENEETRAEG0-0",
    "str:field:deeehh02:F-DATA:C-C57-0",
    "str:field:deeehh02:F-DATA:C-ENEZEIT-0",
    "str:field:deeehh02:F-DATA:C-ENEVERW-0"
  )
)
GregorDeCillia commented 2 years ago

For now, it is probably best to use the following workaround

This will make sure that $tabulate() falls back to aggeegating via sums and the above issue is resolved if unweighted sums are an appropriate way of aggregating the data

It would be useful if STATcubeR would do this fallback automatically in certain situations, but the danger here is that "real missings" could be replaced by sums in situations where this is not meaningful.