r-quantities / units

Measurement units for R
https://r-quantities.github.io/units
175 stars 28 forks source link

squared unit behaves differently in numerator and denominator #221

Open bergsmat opened 4 years ago

bergsmat commented 4 years ago

Thanks for a great package.

I notice that both m^2 and m2 share a common internal representation when used in a numerator, but not for a denominator. Should behaviors match?

> as_units('m^2')
 1 [m^2] 
> as_units('m2') 
1 [m^2] 
> as_units('kg/m^2') 
1 [kg/m^2] 
> as_units('kg/m2') 
1 [kg/m2] 
> str(as_units('m^2')) 
Object of class units:  num 1  - attr(*, "units")=List of 2  
 ..$ numerator  : chr [1:2] "m" "m"  
 ..$ denominator: chr(0)   
 ..- attr(*, "class")= chr "symbolic_units" 
> str(as_units('m2')) 
Object of class units:  num 1  - attr(*, "units")=List of 2  
 ..$ numerator  : chr [1:2] "m" "m"  
 ..$ denominator: chr(0)   
 ..- attr(*, "class")= chr "symbolic_units" 
> str(as_units('kg/m^2')) 
Object of class units:  num 1  - attr(*, "units")=List of 2   
..$ numerator  : chr "kg"   
..$ denominator: chr [1:2] "m" "m"   
..- attr(*, "class")= chr "symbolic_units" 
> str(as_units('kg/m2')) 
Object of class units:  num 1  - attr(*, "units")=List of 2   
..$ numerator  : chr "kg"   
..$ denominator: chr "m2"  
 ..- attr(*, "class")= chr "symbolic_units"
Enchufa2 commented 4 years ago

I think they should. Thanks for the report!

edzer commented 4 years ago

I disagree: the notations either use ^ and /, or none of them and use space as multiplication; it should be

> as_units('kg m-2')
1 [kg/m^2]

the kg/m2 is a mix of both, where the 2 is not interpreted as square.

Enchufa2 commented 4 years ago

So still there's a mismatch, because it is interpreted as a square in the numerator.

> as_units('m2') 
1 [m^2]

And it should be too in the denominator because anyway udunits interprets m2 as a square:

> as_units('m^2') * as_units('kg/m2')
1 [kg*m^2/m2]
> set_units(as_units('m^2') * as_units('kg/m2'), kg)
1 [kg]
edzer commented 4 years ago

We can try to have the case as_units('kg/m2') generate a helpful error.

Enchufa2 commented 4 years ago

But why if udunits supports this?

> units:::R_ut_format(units:::R_ut_parse("m^2"))
[1] "m²"
> units:::R_ut_format(units:::R_ut_parse("m2"))
[1] "m²"
> units:::R_ut_format(units:::R_ut_parse("kg/m^2"))
[1] "m⁻²·kg"
> units:::R_ut_format(units:::R_ut_parse("kg/m2"))
[1] "m⁻²·kg"
edzer commented 4 years ago

You have a point there. So we either should support it too, or not support it and generate an error. Maybe second option on the short term, first on the long-term?

I can see where this comes from: we're using R's expression evaluation mechanism for things involving / or ^, and parse ourselves if not. R's expression evaluation sees m2 as a symbol, different from m^2.

Enchufa2 commented 4 years ago

The only thing that is "broken" is the internal representation ("m2" in the denominator instead of c("m", "m")), and therefore the formatting. But given that uduntis supports "m2" as well, the behaviour is correct: it's working just fine, as the examples above demonstrate. So, in my opinion, raising an error now breaks backwards compatibility: there may be scripts out there using the "m2" notation both in numerators and denominators (disclaimer: I always use this notation, I'm lazy enough to avoid the "^" as much as possible).

Enchufa2 commented 3 years ago

@t-kalinowski Maybe you can take a look at this. The problem is here, which git blame attributes to you: :)

https://github.com/r-quantities/units/blob/b88ecdd19a0c95d19039332b9f9df9ce99053715/R/make_units.R#L285-L289

For as_units("m2"), are_exponents_implicit returns TRUE, and thus m2 is converted to m^(2) below. But for as_units("m2/kg"), returns FALSE, and m2 is not converted. I think we always want to perform those conversions, right?, but convert_implicit_to_explicit_exponents doesn't want either * or /.

t-kalinowski commented 3 years ago

I'm not fully remembering the intricacies of this, maybe you can help me remember why this is a dumb idea: can we just always insert a "^" anywhere a numeric is adjacent to non numeric that's not an operator? E.g., some kind of regex that does:

make_all_exponents_explicit <- function(x) 
  gsub("([^\\d^*/]+)\\s*\\^?\\s*([\\d+\\.]?\\d+)", "\\1^\\2", x)

sapply(c("m^2",  "m2", "kg/m^2", "kg/m2"),
       make_all_exponents_explicit)
#>      m^2       m2   kg/m^2    kg/m2 
#>    "m^2"    "m^2" "kg/m^2" "kg/m^2"

Created on 2021-02-20 by the reprex package (v1.0.0)

Still need to handle negative exponents somehow.

Enchufa2 commented 3 years ago

If you don't remember, we have a problem. :grimacing: :dancers:

Enchufa2 commented 3 years ago

This is problematic:

as_units("m-2/g")
#> Error: cannot convert 1/g into m
#> Did you try to supply a value in a context where a bare expression was expected?
as_units("m-2/g", implicit_exponents=TRUE)
#> Error in convert_implicit_to_explicit_exponents(x) : 
#>   If 'implicit_exponents = TRUE', strings cannot contain `*' or `/'

No way forward with the current implementation.

t-kalinowski commented 3 years ago

Good thing this project has nice suite of unit tests! :relieved:

It's coming back to me. There used to be two completely different code paths for implicit and explicit exponents. Over time those converged quite a bit, and perhaps it's time to converge them even further. How about something like this:

make_all_exponents_explicit <- function(x) 
  gsub("([^\\d^*-/]+)\\s*\\^?\\s*(-?[\\d+\\.]?\\d+)", "((\\1)^(\\2))", x)

sapply(c("m^2",  "m2", "kg/m^2", "kg/m2", "m-2/g", "g/m-2"),
       make_all_exponents_explicit)
#>            m^2             m2         kg/m^2          kg/m2          m-2/g 
#>    "((m)^(2))"    "((m)^(2))" "kg/((m)^(2))" "kg/((m)^(2))" "((m)^(-2))/g" 
#>          g/m-2 
#> "g/((m)^(-2))"

and the idea being that every supplied unit string has make_all_exponents_explicit called on it before it is parsed.

Created on 2021-02-20 by the reprex package (v1.0.0)

t-kalinowski commented 3 years ago

I'm happy to submit a PR for this, but it might be about 2-3 weeks before I'll have time to sit down and actually work on this.

Enchufa2 commented 3 years ago

It's coming back to me. There used to be two completely different code paths for implicit and explicit exponents. Over time those converged quite a bit, and perhaps it's time to converge them even further.

Convergence sounds good to me. :)

How about something like this:

make_all_exponents_explicit <- function(x) 
  gsub("([^\\d^*-/]+)\\s*\\^?\\s*(-?[\\d+\\.]?\\d+)", "((\\1)^(\\2))", x)

sapply(c("m^2",  "m2", "kg/m^2", "kg/m2", "m-2/g", "g/m-2"),
       make_all_exponents_explicit)
#>            m^2             m2         kg/m^2          kg/m2          m-2/g 
#>    "((m)^(2))"    "((m)^(2))" "kg/((m)^(2))" "kg/((m)^(2))" "((m)^(-2))/g" 
#>          g/m-2 
#> "g/((m)^(-2))"

and the idea being that every supplied unit string has make_all_exponents_explicit called on it before it is parsed.

But I don't think that udunits2 is able to parse so many parentheses.

I'm happy to submit a PR for this, but it might be about 2-3 weeks before I'll have time to sit down and actually work on this.

That would be great, thanks. We are preparing a release due to CRAN errors, but this needs to be properly considered and tested, so no hurries.

t-kalinowski commented 3 years ago

I think it's R that does the first parsing of these parenthesis, not udunits (unless this is something that's changed recently)