kislyuk / yq

Command-line YAML, XML, TOML processor - jq wrapper for YAML/XML/TOML documents
https://kislyuk.github.io/yq/
Apache License 2.0
2.57k stars 82 forks source link

Support for roundtripping YAML comments #106

Open paul-hammant opened 3 years ago

paul-hammant commented 3 years ago

YAML has comments, JSON doesn't jq strips comments and the -Y -y flags doesn't get them back again.

kislyuk commented 3 years ago

Thanks for reporting - I agree this should be documented (or implemented in -Y). PRs are welcome.

jtagcat commented 3 years ago

Duplicate of #28

Easiest way to do this seems to move the comment to a json object.

# this is yaml
i_am_yaml: # here is comment on same line
  bugbug: false # an another one
{
"yq_yaml_xml_toml_processor_metakey_making_this_long_preventing_keyname_collisions¤comment_singleline": "this is yaml"
"i_am_yaml": [
  "bugbug": false
  "yq_yaml_xml_toml_processor_metakey_making_this_long_preventing_keyname_collisions¤comment_sameline_bugbug": "an another one"
  ],
 "yq_yaml_xml_toml_processor_metakey_making_this_long_preventing_keyname_collisions¤comment_sameline_i_am_yaml": "here is comment on same line"
}

There're two flaws I can think of:

There's probably a better way to solve this, but this is what came to head.

kislyuk commented 2 years ago

@andrewcrook your comment does not seem to add anything of use to the discussion. Please refrain from making dismissive content-free comments.

kislyuk commented 2 years ago

@jtagcat thanks for your suggestion. I agree with your approach and have previously tried to implement it. However the issue I am facing is that there is no obvious place to hook into the PyYAML parser to retrieve comments from the input stream (they get discarded too early in the process) and no obvious data model to emit the comments in the dumper.

I know ruamel.yaml has comment round-tripping, but I'm not sure if the parser/dumper modifications made there to accomplish this would be portable into our customizations of PyYAML. Ultimately this seems possible but might require the use of the slow pure-Python tokenizer in the parser, and might require extensive customization. With that said, PRs are welcome.

kislyuk commented 2 years ago

@andrewcrook thanks for expanding upon your original comment, it is now legible as a contribution to this discussion.

While I appreciate the pointer to JSONC, the comment you linked on the jq wiki outlines why jq will not support it. Rewriting, repackaging, or replacing jq is outside the scope of this project, so none of the suggestions related to that are productive. You can check the comments by @jtagcat above for pointers to the course of action that we consider most likely to be productive. PRs are welcome.

andry81 commented 2 years ago

A workaround for this issue through the bash scripts based on diff+patch approach: https://github.com/mikefarah/yq/issues/515#issuecomment-1207700251

kislyuk commented 3 months ago

For reference, this is where PyYAML throws out comments (this happens in the tokenizer, before any of the parsing or constructor steps that we hook into):

https://github.com/yaml/pyyaml/blob/48838a3c768e3d1bcab44197d800145cfd0719d6/lib/yaml/scanner.py#L778-L780