Fixed CND parsing under Windows OS

phpcr / phpcr-utils

A set of helper classes and command line commands you want to use with phpcr, but that are not part of the API itself

phpcr.github.io

Other

72 stars 30 forks source link

Fixed CND parsing under Windows OS #48

Closed dinamic closed 11 years ago

dinamic commented 11 years ago

CND data was not correctly parsed while using Windows OS. The issue was caused by the different line endings between the operating systems.

I came up with a fix and updated the unit tests to test both Windows and Unix line endings.

I've found some discrepancies inside the FileReader class, so it is updated as well.

This commit fixes: https://github.com/symfony-cmf/symfony-cmf-standard/issues/21

lsmith77 commented 11 years ago

thank you! can you have a look at the failing test on Travis? @dbu should have a look at you method rename

dinamic commented 11 years ago

Sorry, didn't noticed the failing test.

dinamic commented 11 years ago

While checking the failed tests I realized there's an issue with my current implementation. It has to use look-ahead to determine the new line sequence, rather than look-behind.

dbu commented 11 years ago

thanks for this PR. i commented on some naming things. can you solve the look-ahead issue you have or do you need input on that?

is the reason you ran into this issue in the first place that your git client converted the line endings? i fear that we still can see problems after this fix. what would happen if we always check for both types of line endings, and maybe add the apple OS endings as well? might be the most flexible approach.

btw, impressive that after 30 years of PC we still see this stupid problem :-(

dinamic commented 11 years ago

The look-ahead is solved.

The issue I came upon was invoked because of the hardcoded EOL. It was set like "\n" and for Windows is "\r\n". Later on the process of parsing there was the assumption that the EOL is a single character, which obviously is not true, so the code failed to parse the string.

I'm struggling at the GenericScanner now. There are some naming wtfs that are pretty confusing.

Once I'm done with the GenericScanner, I believe we would be able to handle whatever EOL we like. LF, CR, CR+LF, LF+CR, you name it. IMHO, it is a design fail that caused the issue.

I had the idea of converting the EOL within the reader constructor. Any thoughts on that?

dbu commented 11 years ago

maybe we could simply string replace all possible formats to \n and then do the char comparison on \n rather than the php constant. the LF char is not valid anywhere else in the cnd document anyway than as end of line character... this could make the code as simple as it was before. we load the whole string anyways, and cnd files are not expected to be huge, so doing the replace upfront sounds save to me from a memory perspective.

dbu commented 11 years ago

the constructor then would be

$buffer = str_replace("\r\n", "\n", $buffer);
$buffer = str_replace("\r", "\n", $buffer);

dinamic commented 11 years ago

Exactly my point. I was not aware whether the CRLF would be used anywhere inside the document and couldn't decide by myself to do such conversion.

dbu commented 11 years ago

i don't expect any problems, lets convert everything to the one-char \n in the constructor of BufferReader, and simplify the code again (we won't need peek and multi-char EOL anymore then)

dbu commented 11 years ago

@dinamic can you do the cleanup and convert all cnd files to unix line endings in the constructor, before further handling them?

relo-san commented 11 years ago

@dbu In Windows (at least Windows 7) in PHP global constant PHP_EOL by default equal CRLF, not LF. Accordingly, any comparison between currentChar() and PHP_EOL not work, because PHP_EOL contains 2 chars. In this case, it does not matter what kind of line breaks in the cnd files.

relo-san commented 11 years ago

I see the most reasonable solution to convert all line breaks in cnd to the LF and replace PHP_EOL in all comparisons also to LF. P.S. Sorry, I casually read discussion, it's an already known.

dbu commented 11 years ago

yes, thats exactly what we should do. i hope dinamic can do it - if you don't want to wait you could fork his repo and finish it and create a new PR

lsmith77 commented 11 years ago

superseded by #51

dinamic commented 11 years ago

Thanks for contributing @relo-san. I didn't had the time to finish my fix.