Open lorenzwalthert opened 5 years ago
You're right in that it appears that the libyaml team seems unmotivated to add 1.2 support. It may be worth switching over to a different library. I'd want to think carefully about the benefits of doing so, however.
Up
It would simplify any automated workflow if YAML header (%YAML 1.2
) could be read (or even just ignored) by read_yaml()
Like a for read.csv(file, skip=2)
Upgrading to YAML 1.2 is a pretty big undertaking. It basically means rewriting most of the package, since it will require changing the underlying YAML library. If I were to do that, I'd probably create a new package called 'yaml12' or something like that. I'm slowly edging my way to doing that though.
The libyaml folks are tracking tickets on github now: https://github.com/yaml/libyaml/issues/20
I've taken over maintenance for this package. I'm considering a fork since it looks like libyaml supports 1.2. The package name proposal is yaml12. Thoughts?
@spgarbet if you do make a new package, I'd love to have the opportunity to review before you release to CRAN so we could ensure that we close a few security risks (e.g.) ensuring that eval.expr = FALSE
by default.
@hadley Jeremy has moved on to private industry and I'm picking this up as maintainer. I'll look into the eval.expr=FALSE issue.
@hadley I pushed up a new version with eval.expr=FALSE by default. I'd like to fix the UTF issue before pushing to CRAN.
@spgarbet you're changing this package? I'm not sure that's a good idea because it has the potential to break existing code. I was only suggesting changing the default if you do yaml12.
The plan was always to at some point change the default to FALSE. The current CRAN version spits out a warning that this will happen in a future version. If it's too disruptive I can defer this patch.
Ah in that case you're probably ok.
It's that difficult intersection of security vs. disruption. I'd prefer to avoid disruption at the highest levels, conversely security needs must be addressed.
There is another library with almost the same interface call, libfyaml
. It fully supports 1.2 and is prepping for full 1.3 support. Work has begun to see what it would take to implement that as the core dependency. It's not looking too bad. The existing library is actually close to being 1.2 compliant.
Oddly this request is quite close. There is only a couple issue left in the libyaml library before it has full 1.2 support.
I've split the project over to https://github.com/vubiostat/yaml12
This project seems as close as it's ever going to get to 1.2 using the current libyaml.
@spgarbet What is the current state of YAML 1.2 support in this package? I'm asking because latest stable version 2.3.7 still does write_yaml()
logicals as yes
/no
instead of true
/false
.
(I know I can use a custom handler
for logical
s to work around this but I think the default should be compliant to the latest YAML spec out-of-the-box.)
The library depends on an upstream project libyaml. That author has stated he's done what he could to get it as close to 1.2 as possible. I submitted a couple PR and he wasn't interested in them, so I quit trying. I do have a shell of this project and at one point was trying to create a version that depended on libfyaml, which is faster and support 1.2. I've not had time recently to work on it. If you'd like a go, I can give you the code and we could get a project going.
Thanks for the answer!
I do have a shell of this project and at one point was trying to create a version that depended on libfyaml, which is faster and support 1.2. I've not had time recently to work on it.
You're referring to this repo, I suppose?
If you'd like a go, I can give you the code and we could get a project going.
I don't know if I'd be of much use... I don't really have experience in C.
The library depends on an upstream project libyaml. That author has stated he's done what he could to get it as close to 1.2 as possible.
Who do you mean here? The original author @xitology?
I submitted a couple PR and he wasn't interested in them, so I quit trying.
I can't find any PRs from you that are related to YAML 1.2 support. And again, who do you mean with "he" here?
Note that the handling of yes/no, true/false etc. is nothing which is done by libyaml. libyaml is just a parser and emitter, the scalar values are all strings.
and at one point was trying to create a version that depended on libfyaml, which is faster and support 1.2.
I'm currently writing a libfyaml binding for perl. The library is passing all 1.2 tests, and it hides its internals and provides a stable API, and the author is still around and developing it, so yes, it is a good idea to use it. Visit us on matrix if you have questions about libfyaml. I think you were already there at some point.
Well the PR's weren't directly related to 1.2, so but I didn't see it as fruitful to make much change.
I didn't realize the yes/no was on this side (I inherited maintenance of this code and haven't fully explored it). If that's the case I can change it quickly. I'm out of town till June. However, making a change like this is huge in downstream dependencies.
Suggestion has been to move to another package yaml12
as the name. The data.table
team has a set of test cases that deeply test the package. If we wish to switch this over to the main package name yaml
then we'd have to get approval from Hadley Wickham and the data.table team.
I pushed up what seems like a really simple change (DOH!) to make this work, the test cases are now failing.
@salim-b Can you install the branch origin/issue-75-yaml12
and tell me if it works for you? I can get the tests changed for this pretty quickly.
@perlpunk I'm still lurking there. Just been very busy with a more pressing couple projects.
Note that for full support of the YAML 1.2 schema you need a bit more than just the yes/no boolean thing. Here you can find a pretty complete test data set: https://github.com/perlpunk/yaml-test-schema 1.1/1.2 Comparison: https://perlpunk.github.io/yaml-test-schema/schemas.html Test data visualized: https://perlpunk.github.io/yaml-test-schema/data.html
I pushed up what seems like a really simple change (DOH!) to make this work, the test cases are now failing.
@salim-b Can you install the branch
origin/issue-75-yaml12
and tell me if it works for you? I can get the tests changed for this pretty quickly.
Nope, same result.
remotes::install_github("vubiostat/r-yaml@issue-75-yaml12")
#> Using github PAT from envvar GITHUB_PAT
#> Skipping install of 'yaml' from a github remote, the SHA1 (1aa405f3) has not changed since last install.
#> Use `force = TRUE` to force installation
yaml::as.yaml(list(a = TRUE, b = FALSE))
#> [1] "a: yes\nb: no\n"
Created on 2023-05-29 with reprex v2.0.2
Alright, I'm off the road and back in town. I can solve this here in a bit. It's doable.
I'm thinking for rollout, it could be controlled via a flag, i.e. "yaml12=FALSE" by default. Then it's one library and folks can switch over as needed.
I made a quick attempt at this. The core parse tree specified by "re" I altered. This is a scary change for such a used package. I don't see and easy way to make it flag dependent.
> library(yaml)
> yaml::as.yaml(list(a = TRUE, b = FALSE))
[1] "a: true\nb: false\n"
> yaml::as.yaml(list(a = "yes", b = FALSE))
[1] "a: yes\nb: false\n"
My workaround solution for emitting true
or false
instead of yes
or no
was to use a verbatim class:
verbatim_logical <- function(x) {
verbatim_logical <- tolower(as.logical(x))
class(verbatim_logical) <- "verbatim"
return(verbatim_logical)
}
> verbatim_logical(FALSE)
[1] "false"
attr(,"class")
[1] "verbatim"
> verbatim_logical(TRUE)
[1] "true"
attr(,"class")
[1] "verbatim"
> verbatim_logical(52)
[1] "true"
attr(,"class")
[1] "verbatim"
> verbatim_logical(0)
[1] "false"
attr(,"class")
[1] "verbatim"
> verbatim_logical(NA)
[1] NA
attr(,"class")
[1] "verbatim"
That emits:
section:
key: false
Instead of:
section:
key: no
or:
section:
key: 'false'
In the absence of full YAML 1.2 support, even a small wrapper like the above to convert booleans to these could be useful, just as a signal to folks who aren't aware of the problem 🙂
This is a really attractive idea that doesn't involve getting into the guts of the lexer/compiler.
Can also confirm it works fine with length > 1:
list(
section = list(
fruit = verbatim_logical(c(T, T, T, T, F)))) |>
write_yaml('test.yml')
Produces:
section:
fruit:
- true
- true
- true
- true
- false
(idk why you'd want an array of bools, but 🤷🏻♂️)
JFTR: verbatim_logical()
is part of the package since https://github.com/vubiostat/r-yaml/commit/c8c5165f3e6b985f0a67fc42286bafa4fc7dc705, included in yaml 2.3.8+, so we can now use:
list(a = TRUE,
b = FALSE) |>
yaml::as.yaml(handlers = list(logical = yaml::verbatim_logical))
#> [1] "a: true\nb: false\n"
Created on 2024-07-15 with reprex v2.1.1
IMHO, yaml::verbatim_logical()
should be made the default handler for logicals in a future release (maybe accompanied by a major version bump).
It would be great if that package could support yaml 1.2, which apparently the definite version and is already 10 years old now (https://yaml.org/spec/). Changes are nicely summarized here: https://perlpunk.github.io/slides.tpcig2018/yamlpp/slide022.html. Unexpected behavior as in https://github.com/viking/r-yaml/issues/5 (see the many cross-references) would wanish.
This would imply to turn away from libyaml because it does not seem they are going to implement that spec soon, an issue pending for more than two years: https://github.com/yaml/libyaml/issues/20. A popular alternative to libyaml in C++ would be https://github.com/jbeder/yaml-cpp, but there are many, also in python.