vubiostat / r-yaml

R package for converting objects to and from YAML
http://biostat.app.vumc.org/wiki/Main/YamlR
Other
166 stars 38 forks source link

Array of 1 not represented as list in R #69

Open svdwoude opened 5 years ago

svdwoude commented 5 years ago

When we parse below example with yaml.load

key: value
array1:
  - item
array2:
  - item1
  - item2

we get

str(yaml::yaml.load("
key: value
array1:
  - item
array2:
  - item1
  - item2
"))
#> List of 3
#>  $ key   : chr "value"
#>  $ array1: chr "item"
#>  $ array2: chr [1:2] "item1" "item2"

I would expect to have the option that makes sure array1 is a list of 1 in this case, representing it was an array of length 1.

Just as a reference when I do the same thing for json in jsonlite

str(jsonlite::fromJSON('
{
  "key": "value",
  "array1": ["item"],
  "array2": ["item1", "item2"]
}
', simplifyVector = FALSE))
#> List of 3
#>  $ key   : chr "value"
#>  $ array1:List of 1
#>   ..$ : chr "item"
#>  $ array2:List of 2
#>   ..$ : chr "item1"
#>   ..$ : chr "item2"

Note: when using jsonlite::read_json the simplifyVector argument is set to FALSE by default

Could we add a similar simplifyVector = FALSE option to force the R object to represent the original structure?

(applies to trestletech/plumber#390 )

viking commented 5 years ago

The easiest way to accomplish what you want at this moment is to use a combination of a custom handler and setting the as.named.list parameter to FALSE:

str(yaml.load("{ test: [123, 456] }", handlers = list(seq = function(x) x), as.named.list=FALSE))
#> List of 1
#>  $ :List of 2
#>  ..$ : int 123
#>  ..$ : int 456
#>  - attr(*, "keys")=List of 1
#>  ..$ : chr "test"

It might be a good idea to add a more user-friendly parameter, though.

viking commented 5 years ago

Check out the documentation for as.named.list, since you might not need it. By default, yaml.load will coerce keys to strings, but there's no requirement that a map key has to be a string. If you turn off as.named.list, coercion is not performed.

hantonita commented 5 years ago
str(yaml.load("{ test: [123, 456] }", handlers = list(seq = function(x) x)))

is a good solution for this issue (I indeed still need the list names), but as mentioned above it would be nice to have an option similar to jsonlite’s simplifyVector = FALSE which would insert this handler behind the scenes. This would make use of this feature much more intuitive. A good place for this could be read_yaml() since this is a convenience function with the same objective as yaml.load_file(). @viking would you accept a PR for the implementation of this feature?

viking commented 5 years ago

I don't like the idea of only implementing it in read_yaml. That function only exists as a wrapper for those who wish to have a readr-like interface. I'm also not sure about the parameter name simplifyVector. There's more than just simplification of vectors going on. This feature idea is more complex than it seems on the surface.

The above solution works by disabling coercion of YAML sequences from lists to vectors by way of a custom handler that does nothing. Having a user-friendly option just for disabling coercion of YAML sequences seems a bit too limited in scope to deserve its own parameter. I think having a more general way to disable default handlers would be more appropriate. Maybe something like:

yaml.load("{ test: [123, 456] }",  pristine = c("seq"))

The pristine parameter (or something similar) could be used to disable default handlers for the specified YAML types.

spgarbet commented 2 years ago

I can't see a lot of anything going on in the default handlers except for "seq". I feel like I'm missing something reading the code.

salim-b commented 1 year ago

I think it would be important for the default settings to let a YAML input survive a yaml::read_yaml() -> yaml::write_yaml() round trip unharmed.

Currently, length-1 YAML sequences get "simplified" during a round trip which is not what most people might expect:

tmp_file <- tempfile(fileext = ".yml")

cat("key: value
array1:
  - item
array2:
  - item1
  - item2
",
file = tmp_file)

# before yaml round trip
readLines(con = tmp_file) |> cat(sep = "\n")
#> key: value
#> array1:
#>   - item
#> array2:
#>   - item1
#>   - item2

yaml::yaml.load_file(input = tmp_file) |> yaml::write_yaml(file = tmp_file)

# after yaml round trip
readLines(con = tmp_file) |> cat(sep = "\n")
#> key: value
#> array1: item
#> array2:
#> - item1
#> - item2

Created on 2023-06-18 with reprex v2.0.2