vubiostat / r-yaml

R package for converting objects to and from YAML
http://biostat.app.vumc.org/wiki/Main/YamlR
Other
162 stars 39 forks source link

yaml 1.2 support #76

Open lorenzwalthert opened 5 years ago

lorenzwalthert commented 5 years ago

It would be great if that package could support yaml 1.2, which apparently the definite version and is already 10 years old now (https://yaml.org/spec/). Changes are nicely summarized here: https://perlpunk.github.io/slides.tpcig2018/yamlpp/slide022.html. Unexpected behavior as in https://github.com/viking/r-yaml/issues/5 (see the many cross-references) would wanish.

This would imply to turn away from libyaml because it does not seem they are going to implement that spec soon, an issue pending for more than two years: https://github.com/yaml/libyaml/issues/20. A popular alternative to libyaml in C++ would be https://github.com/jbeder/yaml-cpp, but there are many, also in python.

viking commented 5 years ago

You're right in that it appears that the libyaml team seems unmotivated to add 1.2 support. It may be worth switching over to a different library. I'd want to think carefully about the benefits of doing so, however.

arthurldp commented 3 years ago

Up

It would simplify any automated workflow if YAML header (%YAML 1.2) could be read (or even just ignored) by read_yaml() Like a for read.csv(file, skip=2)

viking commented 3 years ago

Upgrading to YAML 1.2 is a pretty big undertaking. It basically means rewriting most of the package, since it will require changing the underlying YAML library. If I were to do that, I'd probably create a new package called 'yaml12' or something like that. I'm slowly edging my way to doing that though.

spgarbet commented 3 years ago

The libyaml folks are tracking tickets on github now: https://github.com/yaml/libyaml/issues/20

spgarbet commented 2 years ago

I've taken over maintenance for this package. I'm considering a fork since it looks like libyaml supports 1.2. The package name proposal is yaml12. Thoughts?

hadley commented 2 years ago

@spgarbet if you do make a new package, I'd love to have the opportunity to review before you release to CRAN so we could ensure that we close a few security risks (e.g.) ensuring that eval.expr = FALSE by default.

spgarbet commented 2 years ago

@hadley Jeremy has moved on to private industry and I'm picking this up as maintainer. I'll look into the eval.expr=FALSE issue.

spgarbet commented 2 years ago

@hadley I pushed up a new version with eval.expr=FALSE by default. I'd like to fix the UTF issue before pushing to CRAN.

hadley commented 2 years ago

@spgarbet you're changing this package? I'm not sure that's a good idea because it has the potential to break existing code. I was only suggesting changing the default if you do yaml12.

spgarbet commented 2 years ago

The plan was always to at some point change the default to FALSE. The current CRAN version spits out a warning that this will happen in a future version. If it's too disruptive I can defer this patch.

hadley commented 2 years ago

Ah in that case you're probably ok.

spgarbet commented 2 years ago

It's that difficult intersection of security vs. disruption. I'd prefer to avoid disruption at the highest levels, conversely security needs must be addressed.

spgarbet commented 2 years ago

There is another library with almost the same interface call, libfyaml. It fully supports 1.2 and is prepping for full 1.3 support. Work has begun to see what it would take to implement that as the core dependency. It's not looking too bad. The existing library is actually close to being 1.2 compliant.

spgarbet commented 2 years ago

Oddly this request is quite close. There is only a couple issue left in the libyaml library before it has full 1.2 support.

spgarbet commented 1 year ago

I've split the project over to https://github.com/vubiostat/yaml12

spgarbet commented 1 year ago

This project seems as close as it's ever going to get to 1.2 using the current libyaml.

salim-b commented 1 year ago

@spgarbet What is the current state of YAML 1.2 support in this package? I'm asking because latest stable version 2.3.7 still does write_yaml() logicals as yes/no instead of true/false.

(I know I can use a custom handler for logicals to work around this but I think the default should be compliant to the latest YAML spec out-of-the-box.)

spgarbet commented 1 year ago

The library depends on an upstream project libyaml. That author has stated he's done what he could to get it as close to 1.2 as possible. I submitted a couple PR and he wasn't interested in them, so I quit trying. I do have a shell of this project and at one point was trying to create a version that depended on libfyaml, which is faster and support 1.2. I've not had time recently to work on it. If you'd like a go, I can give you the code and we could get a project going.

salim-b commented 1 year ago

Thanks for the answer!

I do have a shell of this project and at one point was trying to create a version that depended on libfyaml, which is faster and support 1.2. I've not had time recently to work on it.

You're referring to this repo, I suppose?

If you'd like a go, I can give you the code and we could get a project going.

I don't know if I'd be of much use... I don't really have experience in C.

perlpunk commented 1 year ago

The library depends on an upstream project libyaml. That author has stated he's done what he could to get it as close to 1.2 as possible.

Who do you mean here? The original author @xitology?

I submitted a couple PR and he wasn't interested in them, so I quit trying.

I can't find any PRs from you that are related to YAML 1.2 support. And again, who do you mean with "he" here?

perlpunk commented 1 year ago

Note that the handling of yes/no, true/false etc. is nothing which is done by libyaml. libyaml is just a parser and emitter, the scalar values are all strings.

perlpunk commented 1 year ago

and at one point was trying to create a version that depended on libfyaml, which is faster and support 1.2.

I'm currently writing a libfyaml binding for perl. The library is passing all 1.2 tests, and it hides its internals and provides a stable API, and the author is still around and developing it, so yes, it is a good idea to use it. Visit us on matrix if you have questions about libfyaml. I think you were already there at some point.

spgarbet commented 1 year ago

Well the PR's weren't directly related to 1.2, so but I didn't see it as fruitful to make much change.

I didn't realize the yes/no was on this side (I inherited maintenance of this code and haven't fully explored it). If that's the case I can change it quickly. I'm out of town till June. However, making a change like this is huge in downstream dependencies.

Suggestion has been to move to another package yaml12 as the name. The data.table team has a set of test cases that deeply test the package. If we wish to switch this over to the main package name yaml then we'd have to get approval from Hadley Wickham and the data.table team.

spgarbet commented 1 year ago

I pushed up what seems like a really simple change (DOH!) to make this work, the test cases are now failing.

@salim-b Can you install the branch origin/issue-75-yaml12 and tell me if it works for you? I can get the tests changed for this pretty quickly.

spgarbet commented 1 year ago

@perlpunk I'm still lurking there. Just been very busy with a more pressing couple projects.

perlpunk commented 1 year ago

Note that for full support of the YAML 1.2 schema you need a bit more than just the yes/no boolean thing. Here you can find a pretty complete test data set: https://github.com/perlpunk/yaml-test-schema 1.1/1.2 Comparison: https://perlpunk.github.io/yaml-test-schema/schemas.html Test data visualized: https://perlpunk.github.io/yaml-test-schema/data.html

salim-b commented 1 year ago

I pushed up what seems like a really simple change (DOH!) to make this work, the test cases are now failing.

@salim-b Can you install the branch origin/issue-75-yaml12 and tell me if it works for you? I can get the tests changed for this pretty quickly.

Nope, same result.

remotes::install_github("vubiostat/r-yaml@issue-75-yaml12")
#> Using github PAT from envvar GITHUB_PAT
#> Skipping install of 'yaml' from a github remote, the SHA1 (1aa405f3) has not changed since last install.
#>   Use `force = TRUE` to force installation
    yaml::as.yaml(list(a = TRUE, b = FALSE))
#> [1] "a: yes\nb: no\n"

Created on 2023-05-29 with reprex v2.0.2

spgarbet commented 1 year ago

Alright, I'm off the road and back in town. I can solve this here in a bit. It's doable.

I'm thinking for rollout, it could be controlled via a flag, i.e. "yaml12=FALSE" by default. Then it's one library and folks can switch over as needed.

spgarbet commented 1 year ago

I made a quick attempt at this. The core parse tree specified by "re" I altered. This is a scary change for such a used package. I don't see and easy way to make it flag dependent.

spgarbet commented 1 year ago
> library(yaml)
> yaml::as.yaml(list(a = TRUE, b = FALSE))
[1] "a: true\nb: false\n"
> yaml::as.yaml(list(a = "yes", b = FALSE))
[1] "a: yes\nb: false\n"
jimjam-slam commented 1 year ago

My workaround solution for emitting true or false instead of yes or no was to use a verbatim class:

verbatim_logical <- function(x) {
  verbatim_logical <- tolower(as.logical(x))
  class(verbatim_logical) <- "verbatim"
  return(verbatim_logical)
}

> verbatim_logical(FALSE)
[1] "false"
attr(,"class")
[1] "verbatim"
> verbatim_logical(TRUE)
[1] "true"
attr(,"class")
[1] "verbatim"
> verbatim_logical(52)
[1] "true"
attr(,"class")
[1] "verbatim"
> verbatim_logical(0)
[1] "false"
attr(,"class")
[1] "verbatim"
> verbatim_logical(NA)
[1] NA
attr(,"class")
[1] "verbatim"

That emits:

section:
  key: false

Instead of:

section:
  key: no

or:

section:
  key: 'false'

In the absence of full YAML 1.2 support, even a small wrapper like the above to convert booleans to these could be useful, just as a signal to folks who aren't aware of the problem 🙂

spgarbet commented 1 year ago

This is a really attractive idea that doesn't involve getting into the guts of the lexer/compiler.

jimjam-slam commented 1 year ago

Can also confirm it works fine with length > 1:

list(
  section = list(
    fruit = verbatim_logical(c(T, T, T, T, F)))) |>
  write_yaml('test.yml')

Produces:

section:
  fruit:
  - true
  - true
  - true
  - true
  - false

(idk why you'd want an array of bools, but 🤷🏻‍♂️)

salim-b commented 2 months ago

JFTR: verbatim_logical() is part of the package since https://github.com/vubiostat/r-yaml/commit/c8c5165f3e6b985f0a67fc42286bafa4fc7dc705, included in yaml 2.3.8+, so we can now use:

list(a = TRUE,
     b = FALSE) |>
  yaml::as.yaml(handlers = list(logical = yaml::verbatim_logical))
#> [1] "a: true\nb: false\n"

Created on 2024-07-15 with reprex v2.1.1

IMHO, yaml::verbatim_logical() should be made the default handler for logicals in a future release (maybe accompanied by a major version bump).