ingydotnet / yaml-libyaml-pm

Perl Binding to libyaml
http://search.cpan.org/dist/YAML-LibYAML/
33 stars 37 forks source link

YAML::XS generates wrapped text for long scalar value #26

Open djerius opened 9 years ago

djerius commented 9 years ago

Hi,

I've run into the same bug as in rt.cpan.org #71902. Is that bug still in the queue to be fixed, or has it been orphaned over at rt.cpan.org?

Thanks,

Diab

djerius commented 9 years ago

I've tracked it down to this:

in perl_libyaml.c:

532     /* Set up the emitter object and begin emitting */
[...]
535     yaml_emitter_set_width(&dumper.emitter, 2);

Which sets the emitter's best width to 2. In emitter.c:yaml_emitter_emit_stream_start

512         if (emitter->best_width >= 0
513                 && emitter->best_width <= emitter->best_indent*2) {
514             emitter->best_width = 80;
515         }
516
517         if (emitter->best_width < 0) {
518             emitter->best_width = INT_MAX;
519         }

So emitter->best_width is set to 80. Everything funnels down to yaml_emitter_write_plain_scalar, which folds the output:

1914         if (IS_SPACE(string))
1915         {
1916             if (allow_breaks && !spaces
1917                     && emitter->column > emitter->best_width
1918                     && !IS_SPACE_AT(string, 1)) {
1919                 if (!yaml_emitter_write_indent(emitter)) return 0;
1920                 MOVE(string);
1921             }
1922             else {
1923                 if (!WRITE(emitter, string)) return 0;
1924             }
1925             spaces = 1;
1926         }

The allow_breaks flag has a complicated history, so I'm not sure of exactly how it gets set.

In any case, simply changing line 535 in perl_libyaml.c from

    535     yaml_emitter_set_width(&dumper.emitter, 2);

to

    535     yaml_emitter_set_width(&dumper.emitter, -1);

stops the folding, which makes it possible for both YAML::Tiny and YAML to read this particular output from YAML::XS. The first gives the error:

   YAML::Tiny found bad indenting in line

while YAML gives:

   Code: YAML_PARSE_ERR_INCONSISTENT_INDENTATION

However, perhaps that's the wrong solution, and the solution is for those modules to support folded scalars?

perlpunk commented 7 years ago

See also my comment in #29 This is intended behavior. It's generating valid YAML, and yes, it would be nice if it was configurable in YAML::XS. On the other hand I think, YAML.pm should be fixed (or replaced, there is some work going on), and the authors of YAML::Tiny should decide if wrapped plain scalars should be supported or not.

iafan commented 5 years ago

Not sure if it makes sense to revive this old thread or create a new issue, but I see that an ability to add the control for the width was considered a nice to have option, but at the same time the ticket was immediately closed.

I understand that generated YML with wrapped strings is perfectly valid, and most users won't need to tweak this behavior, but there are certain cases where it might be beneficial to disable wrapping at all (which can be achieved by setting the best_width to some very large value).

A use case I spotted in the wild is that YAML is being used to store long strings (paragraphs of text; for localization purposes), and then there's some Git integration running that needs to do something with the strings that were recently modified. With wrapping, git diff shows changes somewhere in the middle of changed strings (because each logical string now spans multiple lines), whereas for single-line strings diff shows entire strings that were affected.

This, of course, is an edge case and there are workarounds for this, but it would be great for the Perl wrapper to actually expose the functionality that already exists in the underlying library.