marberts / piar

Price index aggregation in R
https://marberts.github.io/piar/
Other
4 stars 0 forks source link

Unexpected function results #7

Closed schneiderpy closed 5 months ago

schneiderpy commented 5 months ago

@marberts Following the example in you JOSS paper I have encountered some unexpect results:

Example: a) relatives <- with(ms_prices, price_relative(price, period, product)) gives the result as shown in the paper. It would be nice to add head(relatives) in the paper, so that the user can see what he should expect. However, if I change the order of the attributes I get the following result:

relative <- with(ms_prices, price_relative(period, product, price)) 
head(relative)
1.14 <NA> 6.09 6.23 8.61  6.4 
   1   NA   NA   NA   NA   NA 

b) uninformative error msg

relatives <- with(ms_prices, price_relative(price, product))
Error in price_relative(price, product) : 
  argument "product" is missing, with no default

c)

elementals <- with(ms_prices, elemental_index(relatives, business, period, na.rm = TRUE))
head(elementals)
Period-over-period price index for 4 levels over 4 time periods 
              B1       B2       B3       B4
202001 1.0000000 1.000000 1.000000      NaN
202002 0.8949097      NaN 2.020004      NaN
202003 0.3342939      NaN 1.635335      NaN
202004       NaN 2.770456 0.537996 4.576286

still gives correct computation, however, in a different form However:

elementals <- with(ms_prices, elemental_index(relatives, period, na.rm = TRUE))
head(elementals)
Period-over-period price index for 1 levels over 4 time periods 
  202001   202002  202003  202004
1      1 1.716475 1.19045 1.49501

I suggest more defensive code if input of attributes and order of attributes are important for correct computations.

d) head() and tail() functions for your object "pias" (see paper) doesn't result in a dataframe. Probably, because it is not of class dataframe.

class(pias)
[1] "piar_aggregation_structure"

This functionality would be nice if one has larger data sets.

As part of JOSS review

marberts commented 5 months ago

Thanks, @schneiderpy.

Argument matching

Are items a-c issues with theprice_relative() and elemental_index() functions, or just how they're shown in the example? Most functions in R are sensitive to the presence and position of the arguments; e.g.,

x = c(1:4, NA)
y = 5:1
all.equal(weighted.mean(x, y, na.rm = TRUE), weighted.mean(y, x, na.rm = TRUE))
#> [1] "'is.NA' value mismatch: 1 in current 0 in target"

y = mtcars
merge(y)
#> Error in merge.data.frame(y) : argument "y" is missing, with no default

Perhaps it would be clearer if I write the argument names in the function call. For example, in your example c it would then be clear that the business labels are taken as time periods and the time periods as elemental aggregates

elementals <- with(
  ms_prices,
  elemental_index(
    relatives,
    period = business,
    ea = period,
    na.rm = TRUE
  )
)

or that there are no elemental aggregates supplied (so the default applies, grouping all price relatives into a single elemental aggregate as documented)

elementals <- with(
  ms_prices,
  elemental_index(
    relatives,
    period = period,
    na.rm = TRUE
  )
)

As for more defensive code, the price_relative() and elemental_index() functions each coerce their inputs to the requires types (as documented) and check the length of the vector inputs early in the body of the function.

price_relative(1:5, period = c(1, 1, 2, 2, 3), product = mtcars)
#> Error in xtfrm.data.frame(x) : cannot xtfrm data frames

elemental_index(1:5, period = c(1, 1, 2, 2))
#> Error in elemental_index.numeric(1:5, period = c(1, 1, 2, 2)) : input vectors must be the same length

(This is why your example a works. If ms_prices$period were formatted like 2020-01 instead of 202001 it wouldn't work.) I'm happy to consider any suggestions for a different approach or if there are cases where the argument checking is insufficient.

Printing with head/tail

Your suggestion to include head(relatives) is good. I think it would be best to add it to the data frame to make it easier to see how it transforms prices.

ms_prices$relative <- with(ms_prices, price_relative(price, period, product))

head(subset(ms_prices, business == "B1"))
#>    period business product price  relative
#> 1  202001       B1       1  1.14 1.0000000
#> 2  202001       B1       2    NA        NA
#> 3  202001       B1       3  6.09 1.0000000
#> 11 202002       B1       2  6.94        NA
#> 12 202002       B1       3  5.45 0.8949097
#> 20 202003       B1       2  2.32 0.3342939

# ... make elemental indexes

It's a bit trickier for head(pias). Although you're right that this would be helpful for larger aggregation structures, both head(x)and tail(x) should return an object of the same type as x. Right now I have no subscripting method for aggregation structures as I'm not really sure what such a method should do (if anything). I usually do

print(pias, max = 12)
#> Aggregation structure for 5 elemental aggregates with 2 levels above the elemental aggregates 
#>  level1 level2 ea weight
#> 1      1     11 B1    553
#> 2      1     11 B2    646
#> 3      1     11 B3    312
 [ reached 'max' / getOption("max.print") -- omitted 2 rows ]

to peek into an aggregation structure (just as for a data frame or matrix ). One could also do head(as.data.frame(pias)) to get back a data frame.

schneiderpy commented 5 months ago

@marberts The unexpected results just concern the price_relative() and elemental_index() functions using your example in the paper

relative <- with(ms_prices, price_relative(price, period, product)) 
head(relative)
 1  2  3  4  5  6 
 1 NA  1  1  1  1

relative2 <- with(ms_prices, price_relative(period, product, price)) 
head(relative2)
1.14 <NA> 6.09 6.23 8.61  6.4 
   1   NA   NA   NA   NA   NA 
expect_true(expect_equal(relative, relative2))
Error: `relative` not equal to `relative2`.
Names: 'is.NA' value mismatch: 6 in current 0 in target

Both objects, "relative" and "relative2", are the computational results of price_relative(). Both objects might be subject of further computation in another function which leads to erronous results if the user does not respect the order/number of the attributes. My point is, that there is no error message popping up to alert the user of the missuse of the function.

relatives <- with(ms_prices, price_relative(price, product))
Error in price_relative(price, product) : 
argument "product" is missing, with no default

.. argument "product" is not missing in this example (perhaps an error message saying " ... one argument is missing" would be more informative)

The same arguments (for defensive coding) hold for elemental_index()

One could also do head(as.data.frame(pias)) to get back a data frame.

Is it possible to change the class(pias) to data.frame or as.data.frame? That would probably include a gain in flexibility since the user can apply futher packages e.g. ggplot, knitr or flextable.

As part of the JOSS review

marberts commented 5 months ago

Thanks for the clarification @schneiderpy.

I'm not sure how I could validate the inputs to price_relative() for your first point (but please suggest any ideas). I would need to know which columns in a table contain the correct variables for each of price_relative()'s arguments. The function makes sure that the first argument can be represented as a numeric and that the second two arguments can be represented as factors, but I'm not sure how I would check the content of these variables to determine if the inputs are valid or not.

As for with(ms_prices, price_relative(price, product)), that error message comes from R, not me. Indeed, we can expand the call to see that the product argument has not been supplied and that instead the values in the product column of ms_prices are used to denote time periods.

print_call = function(f) {
  b = as.list(body(f))
  body(f) = as.call(c(b[1], quote(print(match.call())), b[-1]))
  f
}

price_relative = print_call(price_relative)

with(ms_prices, price_relative(price, product))
#> price_relative(x = price, period = product)
#> Error in price_relative(price, product): argument "product" is missing, with no default

It feels like you're expecting with() to match column names to argument names like do.call(). This brings me back to my suggestion to name the arguments to price_relative() and elemental_index() in the paper to make the calls to these functions explicit and not dependent on the order of the arguments. Another idea is to use a formula interface for non-standard evaluation instead of wrapping things in with(). Elemental indexes could then be made like

elementals = elemental_index(
  ms_prices,
  price_relative(price, period, product) ~ period + business,
  na.rm = TRUE
)

# Or, with pipes.
elementals = ms_prices |>
  mutate(relative = price_relative(price, period = period, product = product)) |>
  elemental_index(relative ~ period + product, na.rm = TRUE)

It would be easy enough to add this functionality, and I'm happy to do so if you think it would make the presentation easier.

(A third option would be to dispense with NSE and write out price_relative(ms_prices$price, period = ms_prices$period, product = ms_prices$product), but I find that a bit too verbose.)

As for your second point, changing the class of pias would be tricky. It's not really a tabular object. There are methods to coerce to/from a data frame, but the current data structure I think makes the most sense.

One option is to keep the aggregation data as a table rather than explicitly construct the aggregation structure object. Aggregating with a data frame that represents an aggregation structure works just as well (as documented).

ms_weights[c("level1", "level2")] = expand_classification(ms_weights$classification)

pias = ms_weights[c("level1", "level2", "business", "weight")]
pias
#>   level1 level2 business weight
#> 1      1     11       B1    553
#> 2      1     11       B2    646
#> 3      1     11       B3    312
#> 4      1     12       B4    622
#> 5      1     12       B5    330

ms_index = aggregate(elementals, pias, na.rm = TRUE)

Again, let me know if you think this would make the example in the paper clearer.

schneiderpy commented 5 months ago

@marberts You can try and implement something simple to test for function arguments, e.g.

attributes(price_relative())
Error in price_relative() : argument "x" is missing, with no default
> attributes(price_relative(ms_prices$price))
Error in price_relative(ms_prices$price) : 
  argument "period" is missing, with no default
> attributes(price_relative(ms_prices$price, ms_prices$period))
Error in price_relative(ms_prices$price, ms_prices$period) : 
  argument "product" is missing, with no default

or, as you have already mentioned, something like this:

test <- function(a,b,c) {
  defined <- ls()
  passed <- names(as.list(match.call())[-1])

  if (any(!defined %in% passed)) {
    stop(paste("missing values for", paste(setdiff(defined, passed), collapse=", ")))
  }
  a+b+c
}

As part of the JOSS review

marberts commented 5 months ago

Thanks for the suggestion. I'm going to mark this issue as needing further thought. I've never contemplated changing the default behavior when function arguments are missing, so I need to look into it more.

I'm going to record a couple thoughts here so I don't forget.

  1. Functions in tidyverse seem to rely on the default (obviously the base functions do as well).
dplyr::nth()  # Style guide uses nth() as an example for error messages.
#> Error in dplyr::nth() : argument "x" is missing, with no default

purrr::accumulate()
#> Error in purrr::accumulate() : argument ".f" is missing, with no default

tidyr::replace_na()
#> Error in replace_na.default() : argument "data" is missing, with no default

# Default for S3 dispatch with no default.
dplyr::mutate()
#> Error in UseMethod("mutate") : no applicable method for 'mutate' applied to an object of class "NULL"
  1. Functions that change the default seem to do so when missing arguments are allowed (I don't use that approach in piar).
fixest::feols()
#> Error in fixest::feols() : You must provide the argument 'data' (currently it is missing).
  1. Is there a package out there that automates this? Or did I need to write a factory/decorator for each function?
  2. How do I deal with default values?
test <- function(a, b, c = 1) {
  defined <- ls()
  passed <- names(as.list(match.call())[-1])

  if (any(!defined %in% passed)) {
    stop(paste("missing values for",paste(setdiff(defined, passed), collapse = ", ")))
  }

  a + b + c
}

test(1, 2)
#> Error in test(1, 2) : missing values for c
marberts commented 5 months ago

I've added an example of head(relatives) in the paper in a4774b5.

marberts commented 5 months ago

I decided to merge in some work I had adding NSE to elemental_index() in 96012b2. It doesn't change the example in the paper, but I feel it can make the function a bit easier to use.

schneiderpy commented 5 months ago

@marberts since argument order matters in price_relative() you should probably name your arguments, e.g. price_relative(price = price, period = period, product = product) However, I don't know if this works, since you are calling gpindex::back_period(period, product)for the final step.

I do agree that ms_prices$relative is probably uncessary.

head(pias) still doesn't work as expected

As part of the review

marberts commented 5 months ago

Thanks for the suggestions. I ended up re-writing the code snippets in the example to try and incorporate your comments and make them easier to read (the results are unchanged).

The period and product arguments are now named for price_relative(). The first argument to price_relative() is x, not price, so I leave it unnamed as two out of three named arguments means the call is unambiguous. (I use x instead of price so that I have the option to easily make this function generic in the future.) gpindex::back_period() won't cause any issues here.

Instead of explicitly making an aggregation structure object, pias is now a data frame. This means head(pias) works as expected, as does any other function expecting a data frame. aggregate() converts this data frame to an aggregation structure object in the background.

Lastly, I've removed the use of with() in the example. Instead I use the data frame method for elemental_index() and specify the inputs with a formula. This is similar to using tapply() or lm(). I think this is a bit easier to read, especially if there's no need to show ms_prices$relative.

Hopefully this addresses your comments in this issue, but please let me know if there's still something outstanding.

schneiderpy commented 5 months ago

Trying to replicate your examples in the re-written paper (not in the documentation which are still the same, however, in my opinion, should work with named function arguments)

The first example gives the following error:

elementals <- ms_prices |>
 transform(
    relative = price_relative(price, period = period, product = product)
   ) |>
   elemental_index(relative ~ period + business, na.rm = TRUE)
Error in elemental_index(as.numeric(x), ...) : 
  'list' object cannot be coerced to type 'double'

Now, I'll use part of the chunk to "test" for price_relative()

elementals <- ms_prices |>
   transform(
     relative = price_relative(price, period = period, product = product)
   )
 head(elementals)
  period business product price relative
1 202001       B1       1  1.14        1
2 202001       B1       2    NA       NA
3 202001       B1       3  6.09        1
4 202001       B2       4  6.23        1
5 202001       B2       5  8.61        1
6 202001       B2       6  6.40        1

.. and with different order of the function arguments

elementals2 <- ms_prices |>
   transform(
     relative = price_relative(price, product = product, period = period)
   )
 head(elementals2)
  period business product price relative
1 202001       B1       1  1.14        1
2 202001       B1       2    NA       NA
3 202001       B1       3  6.09        1
4 202001       B2       4  6.23        1
5 202001       B2       5  8.61        1
6 202001       B2       6  6.40        1

worked fine

Now, I'll use the example in the package documentation (and previous example in the paper which led us to this issue) (with correct order)

elementals3 <- with(
   ms_prices, 
   elemental_index(
     price_relative(price, period, product), 
     period, business, na.rm = TRUE
   )
 )
 head(elementals3)
Period-over-period price index for 4 levels over 4 time periods 
   202001    202002    202003   202004
B1      1 0.8949097 0.3342939      NaN
B2      1       NaN       NaN 2.770456
B3      1 2.0200036 1.6353355 0.537996
B4    NaN       NaN       NaN 4.576286

with changed order

elementals4 <- with(
   ms_prices, 
   elemental_index(
     price_relative(price, product, period), 
     period, business, na.rm = TRUE
   )
 )
 head(elementals4)
Period-over-period price index for 4 levels over 4 time periods 
      202001    202002    202003    202004
B1 1.0000000 0.7853026       NaN       NaN
B2 1.0166877       NaN 0.3633362 1.3831875
B3 0.9466638 0.6067904 2.1869572 0.8454208
B4       NaN       NaN 0.2242169 1.7651714

It might be the case that you didn't change the function itself (in the script)

head(pias) now returns a data frame which might be useful if one might plot historical price index evolutions

Additional question: what do the first three rows indicate in chain(ms_index)?

part of the review

marberts commented 5 months ago

Documentation

...not in the documentation which are still the same, however, in my opinion, should work with named function arguments

You're right, that should change. (It seems I didn't resolve the conflicts when I merged that code into the joss branch :disappointed: .) I've made these changes in 57addc7.

Error

Which version of piar are you using to get this error?

elementals <- ms_prices |>
 transform(
    relative = price_relative(price, period = period, product = product)
  ) |>
  elemental_index(relative ~ period + business, na.rm = TRUE)

#> Error in elemental_index(as.numeric(x), ...) : 
#>   'list' object cannot be coerced to type 'double'

Version 0.8.0 doesn't have the functionality I added the other day. You'll need to use version 0.8.0.9005 which is in the joss branch. You can install it with

# You'll need to have {pak} installed.
# install.packages("pak")
pak::pak("marberts/piar@joss")

Old example

I've not made any changes to the price_relative() function. I simply updated the examples to name the arguments to this function and remove the use of with()to make the function call clearer. Indeed,

elementals3 <- with(
  ms_prices, 
  elemental_index(
    price_relative(price, period = period,  product = product), 
    period, business, na.rm = TRUE
  )
)

elementals4 <- with(
  ms_prices, 
  elemental_index(
    price_relative(price, product = product, period = period), 
    period, business, na.rm = TRUE
  )
)

all.equal(elementals3, elementals4)
#> [1] TRUE

Additional question

#> Fixed-base price index for 8 levels over 4 time periods 
#>    202001    202002    202003    202004
#> 1       1 1.3007239 1.3827662 3.7815355
#> 11      1 1.3007239 1.3827662 2.1771866
#> 12      1 1.3007239 1.3827662 6.3279338
#> B1      1 0.8949097 0.2991629 0.4710366
#> B2      1 1.3007239 1.3827662 3.8308934
#> B3      1 2.0200036 3.3033836 1.7772072
#> B4      1 1.3007239 1.3827662 6.3279338
#> B5      1 1.3007239 1.3827662 6.3279338

The first three rows are the aggregated indexes and this is what aggregate(elementals, pias) is computing. In this example businesses B1, B2, and B3 belong to industry 11, and B4 and B5 belong to industry 12. All five of these businesses belong to industry 1. So the index for industry 1 gives the change in price in this industry, which is constructed as the arithmetic index of the change in price for all five businesses. In practice this is usually a Lowe or Young index, but that depends on how the weights in pias are made.

schneiderpy commented 5 months ago

Thank you for the explication of the additional question. It might be a good idea to document this in the package documentation and paper.

Well, the unexpected function results still persist. Now tested with the new version

 packageVersion("piar")
[1] ‘0.8.0.9005’
 elementals <- ms_prices |>
   transform(
     relative = price_relative(price, period = period, product = product)
   ) |>
   elemental_index(relative ~ period + business, na.rm = TRUE)

 head(elementals)
Period-over-period price index for 4 levels over 4 time periods 
   202001    202002    202003   202004
B1      1 0.8949097 0.3342939      NaN
B2      1       NaN       NaN 2.770456
B3      1 2.0200036 1.6353355 0.537996
B4    NaN       NaN       NaN 4.576286
elementals2 <- ms_prices |>
   transform(
     relative = price_relative(price, product, period)
   ) |>
   elemental_index(relative ~ period + business, na.rm = TRUE)
 head(elementals2)
Period-over-period price index for 4 levels over 4 time periods 
      202001    202002    202003    202004
B1 1.0000000 0.7853026       NaN       NaN
B2 1.0166877       NaN 0.3633362 1.3831875
B3 0.9466638 0.6067904 2.1869572 0.8454208
B4       NaN       NaN 0.2242169 1.7651714
 all.equal(elementals, elementals2)
[1] "Component “index”: Component 1: Mean relative difference: 0.03501192"            
[2] "Component “index”: Component 2: Mean relative difference: 0.5224239"             
[3] "Component “index”: Component 3: 'is.NA' value mismatch: 1 in current 2 in target"
[4] "Component “index”: Component 4: Mean relative difference: 0.5714595"    

And the "standalone" function price_relative()

relative1 <- price_relative(ms_prices$price, ms_prices$period, ms_prices$product)
 relative2 <- price_relative(ms_prices$price, ms_prices$product, ms_prices$period)
 all.equal(relative1, relative2)
[1] "Names: 40 string mismatches"                        "'is.NA' value mismatch: 12 in current 12 in target"

I still think that you should name the arguments in the function(s), since we have already discovered that argument order is important and avoids erroneous results if no error message is present.

as part of the review

marberts commented 5 months ago

It might be a good idea to document this in the package documentation and paper.

Good suggestion; I've made it clear in the paper and readme (bd5bc5b86829d773f9765b89f63e7773a1a6a202).

marberts commented 5 months ago

I must admit that I'm confused about the unexpected result with price_relative() as I thought this was an issue about how the function is presented in the paper and documentation.

There's no ambiguity when calling price_relative() if the period and product arguments are named, as you demonstrated in your comment yesterday, and I thought your comment was about naming these arguments so that it's clear what the function expects as an input. (Version 0.8.0.9005 should have much more consistent use of named arguments through the documentation to make the function calls clearer.)

Changing the order of the period and product argument when they're unnamed shouldn't give the same result because of how R matches arguments in a function call: named arguments are matched first and then the remaining arguments are matched in the order they appear. I'm not really sure what's unexpected here.

I still think that you should name the arguments in the function(s), since we have already discovered that argument order is important and avoids erroneous results if no error message is present.

Can you elaborate on this? Do you mean changing the signature of price_relative() to be

price_relative = function(x, ..., period, product) {
  ...
}

so that period and product have to be named?

schneiderpy commented 5 months ago

Well, maybe my English is worse than I thought ... However, I'm surprised that you don't see the difference of the computational results of theprice_relative()

elementals2 <- ms_prices |>
   transform(
     relative = price_relative(price, period, product)
   ) |>
   elemental_index(relative ~ period + business, na.rm = TRUE)
> head(elementals2)
Period-over-period price index for 4 levels over 4 time periods 
   202001    202002    202003   202004
B1      1 0.8949097 0.3342939      NaN
B2      1       NaN       NaN 2.770456
B3      1 2.0200036 1.6353355 0.537996
B4    NaN       NaN       NaN 4.576286
 elementals3 <- ms_prices |>
   transform(
     relative = price_relative(price, product, period)
   ) |>
   elemental_index(relative ~ period + business, na.rm = TRUE)
> head(elementals3)
Period-over-period price index for 4 levels over 4 time periods 
      202001    202002    202003    202004
B1 1.0000000 0.7853026       NaN       NaN
B2 1.0166877       NaN 0.3633362 1.3831875
B3 0.9466638 0.6067904 2.1869572 0.8454208
B4       NaN       NaN 0.2242169 1.7651714

Changing the order of the period and product argument when they're unnamed shouldn't give the same result because of how R matches arguments in a function call: named arguments are matched first and then the remaining arguments are matched in the order they appear. I'm not really sure what's unexpected here.

Of course that is true and this is what an R programmer would expect.

However, I call elementals3 an unexpected result, since you would expect elementals2 . The point still is, that you don't have defensive coding built in (error/warning message) to alert the user to the misuse of your function. Moreover, the erroneous result of elementals3 is further used as input in other computations, which leads to erroneous results and conclusions. You should either a) include defensive coding with an error message b) change price_relative() to

price_relative = function(x, ..., period = period, product = product) {
  ...
}

which hopefully will solve the issue.

part of the JOSS review

marberts commented 5 months ago

Thanks for taking the time to explain the issue again, @schneiderpy. I think I understand now.

The issue isn't that price_relative(price, period, product) should equal price_relative(price, product, period), but rather that it is easy to inadvertently swap the period and product arguments in price_relative() (for which the function gives no warning) and this can ruin all subsequent calculations.

I'm not sure there is a way to correctly check the period and product arguments. Anything that can be coerced to a factor is acceptable for either argument, so I don't think it's possible to raise a warning if they've been reversed. (But please submit a PR if you have a method.) I think the way forward is to add the dot-dot-dots so that period and product need to be named. This is also a good opportunity to make price_relative() generic.

While we've been discussing price_relative(), there are a few more functions that would benefit from this approach.

Please let me know if I understand the issue correctly, and I'll get started making the necessary changes.

schneiderpy commented 5 months ago

The issue isn't that price_relative(price, period, product) should equal price_relative(price, product, period), but rather that it is easy to inadvertently swap the period and product arguments in price_relative() (for which the function gives no warning) and this can ruin all subsequent calculations.

That is exactly what is was trying to say, however, swaped function arguments should give an error not warning, since everything subsequent (result) is not worthwile ... However, it is up to you, if you would like to give the user the possibility to swap the function arguments and you solve this internally in you function (or in the function that will do the final computation) or if you will restrict the input to the correct order of the arguments and error if otherwise.

Of course you can try something like price_relative(x, ...), however, I'm not sure that ellipses argument will change anything. They are usually used to give the possibility to add more arguments.

Question: Have you tryied to implement the named argument as mentioned in the previous commit?

While we've been discussing price_relative(), there are a few more functions that would benefit from this approach.

That is true, but I don't wanted to scare you ;)

As part of the JOSS review

marberts commented 5 months ago

I've updated price_relative() in 9abb448 so that period and product need to be named and cannot be passed to the function by position. (This is on the nse branch, not the joss branch yet,)

elementals2 <- ms_prices |>
  transform(
    relative = price_relative(price, period = period, product = product)
  ) |>
  elemental_index(relative ~ period + business, na.rm = TRUE)

elementals3 <- ms_prices |>
  transform(
    relative = price_relative(price, product = product, period = period)
  ) |>
  elemental_index(relative ~ period + business, na.rm = TRUE)

all.equal(elementals2, elementals3)
#> [1] TRUE

Trying to use the arguments unnamed is an error.

ms_prices |>
  transform(
    relative = price_relative(price, product, period)
 ) 

#> Error in price_relative.default(price, product, period) : argument "period" is missing, with no default

I don't believe there is anything else I can do on this issue. There is no way to further validate the period and product arguments to give an error if they're swapped. (Is price_relative(1:4, c(1, 2, 1, 2), c(3, 3, 4, 4)) making price relatives for products 1 and 2 in periods 3 and 4, or products 3 and 4 in periods 1 and 2? Either interpretation could be right.) Adding the dot-dot-dots is the only way I know of to force arguments to be named.

I think the carry_forward(), carry_backwards(), shadow_price(), and elemental_index() are the only other functions that need to be updated in the same way as price_relative().

Please let me know if these changes address your comment and are sufficient to close this issue, as it will be a bit of work to implement. They also introduce non-backwards compatible changes that will break at least some code that I know is in use.

schneiderpy commented 5 months ago

Perhaps you can try something like this:

price_relative <- function(x, ...)
UseMethod("price_relative")
 # you have the above part already
price_relative.default <- function(x, ...) {
  # extract arguments
  a <- list(...) 
  # define correct order for parameters
  params <- c("price", "product") # or as.factor
  # check that all mandatory parameters are specified
  stopifnot(all(params %in% names(a)))
  # check order of mandatory parameters
  stopifnot(identical(names(a)[names(a) %in% params], params))
  # do calculations
 res <- x / x[gpindex::back_period(period, product)]
  names(res) <- as.character(product)
  res
}

Perhaps you want to change stopifnot with assertthat_ for more informative error messages

As part of the JOSS review

marberts commented 5 months ago

I tried your suggestion but it doesn't pass R CMD check because the signature of the function doesn't have arguments period and product but the documentation does. These arguments clearly need to be documented because the function still expects them.

❯ checking Rd \usage sections ... WARNING
  Documented arguments not in \usage in Rd file 'price_relative.Rd':
    ‘period’ ‘product’

  Functions with \usage entries need to have the appropriate \alias
  entries, and all their arguments documented.
  The \usage entries must correspond to syntactically valid R code.
  See chapter ‘Writing R documentation files’ in the ‘Writing R
  Extensions’ manual.

Adding these arguments to price_relative() makes the extra checks redundant as omitting them will signal an error (e.g., argument "period" is missing, with no default) and there is no need to check the order because they must be passed by name, not position.

The check also raises a note about no visible bindings, but this can be fixed.

❯ checking R code for possible problems ... NOTE
  price_relative.default: no visible binding for global variable ‘period’
  price_relative.default: no visible binding for global variable
    ‘product’
  Undefined global functions or variables:
    period product
schneiderpy commented 5 months ago

Well, you are right, documenting ellipses is difficult ..

Have you tried something like price_relative <- function(x, price = price, product = product, ...)?

The problem is to include the mandatory arguments price and product and to control for the order of both. Since price_relative() is "just" evaluating function input, while gpindex::back_period() is computing the final result, what about spliting the task, say, price_relative()evaluates the input, while gpindex::back_period() (or gpindex::offset_period()) would control for the right order of the arguments? Would that be feasable?

As part of the JOSS review

marberts commented 5 months ago

Adding the default arguments like you suggest makes a circular reference and gives a weird error if an argument is missing.

# Add default arguments.
price_relative <- function(x, period = period, product = product, ...) {
  x <- as.numeric(x)
  period <- as.factor(period)
  product <- as.factor(product)

  if (different_length(x, period, product)) {
    stop("'x', 'period', and 'product' must be the same length")
  }

  res <- x / x[gpindex::back_period(period, product)]
  names(res) <- as.character(product)
  res
}

price_relative(ms_prices$price, ms_prices$period)
#> Error in is.factor(x) : 
#>   promise already under evaluation: recursive default argument reference or earlier problems?

These could be renamed to something else, but then the function can behave in weird ways a use variables in the global environment as defaults. If both period and product are supplied then the function behaves the same as if there were no defaults. Your original point about inadvertently swapping these arguments in the function call still applies.

For your second question, there's no way to make gpindex::back_period() enforce how price_relative() matches arguments. The arguments passed to this function are evaluated in the local scope of the price_relative() function, after they are matched in the function call.

schneiderpy commented 5 months ago

Let's go back to the first idea, trying to implente

price_relative <- function(x, ...)
UseMethod("price_relative")
 # you have the above part already
price_relative.default <- function(x, ...) {
  # extract arguments
  a <- list(...) 
  # define correct order for parameters
  params <- c("price", "product") # or as.factor
  # check that all mandatory parameters are specified
  stopifnot(all(params %in% names(a)))
  # check order of mandatory parameters
  stopifnot(identical(names(a)[names(a) %in% params], params))
  # do calculations
 res <- x / x[gpindex::back_period(period, product)]
  names(res) <- as.character(product)
  res
}

To pass CRAN check you could document the mandatory arguments within @param ... or in a new section called @section Additional parameters:

As part of the JOSS review

marberts commented 5 months ago

Thanks for the suggestion, but I'm not a fan of this idea for a couple reasons.

  1. Overloading the documentation for ... to accommodate two arguments or adding a custom section for parameters is not idiomatic and likely to be confusing. It feels like a hack to get around R CMD check.
  2. I'm not sure what purpose it serves to enforce the order of named arguments: price_relative(x, period = a, product = b) and price_relative(x, product = b, period = a) should both work. There is no scope to accidentally swap the period and product arguments when they're named. It's only a problem when passing arguments by position, i.e., the difference between price_relative(x, a, b) and price_relative(x, b, a) requires the user to remember that the second argument in period and the third argument is product.

I understand your comment here: the period and product arguments are sufficiently similar that it is easy to swap them in a call to price_relative() without raising an error. Unless you're looking at the documentation for the function, this may be an easy mistake to make. I think that's a fair comment, which is why I updated the examples in the paper/readme/help files to use named arguments to make it clear how the function works.

If a fix to the documentation is not sufficient, the only resolution I can see for this issues is to disable matching the period and product arguments by position in price_relative(). The idiomatic way to disable matching arguments by position in R is to force these arguments to be named by having ... appear before them in the function's list of formal arguments. I already do this with a number of generic functions, so it wouldn't be out of place (and would be a good opportunity to make price_relative() generic). I realize there are well-known reasons why the ... can be problematic in R, and I'd prefer to use something like Python's keyword only arguments, but that's just how R works. I'm happy to make this change if it will address your concern and close the issue.

Right now I feel that we're just going back and forth and not really agreeing with the way R matches arguments in a function call. I don't think trying to change the semantics of the language is a way forward; this will just make the function harder to use.

In any case, let me know if making period and product "keyword only" for price_relative() addresses your comment. If so, I can make the change for this function and the others I listed above with similar arguments. If not, while I appreciate your comments and the discussion we've had on this topic, I don't think there's any further changes I can make based on this issue.

schneiderpy commented 5 months ago

My suggested solution is just an idea. Any solution that resolves this isssue is welcome

However, I do not agree in the case where the functionality of a function results in erronous results, without adverting the user, and, furthermore, are used in subsequent computations.

marberts commented 5 months ago

I've updated price_relative() in d4ba0a5 to require period and product to be named. This means that the order of these arguments doesn't matter, so they can't be easily mixed up, and the default error messages are clearer.

all.equal(
  with(ms_prices, price_relative(price, period = period, product = product)),
  with(ms_prices, price_relative(price, product = product, period = period))
)
#> [1] TRUE

with(ms_prices, price_relative(price, period = period))
#> Error in price_relative.default(price, period = period) : argument "product" is missing, with no default

with(ms_prices, price_relative(price, product = product))
#> Error in price_relative.default(price, product = product) : argument "period" is missing, with no default

I've done the same thing for all the other functions with similar arguments, namely elemental_index(), carry_forward(), carry_backward(), and shadow_price(). These functions are also all generic now, and having these argument be named will make it easier to extend these functions in the future.