vitaly-t / decomment

:hammer: Removes comments from JSON, JavaScript, CSS, HTML, etc.
75 stars 21 forks source link

Shouldn't assume files use OS line endings #4

Closed itslenny closed 8 years ago

itslenny commented 8 years ago

Currently the library is using os.EOL to determine the line endings, but it is common (especially on windows) to run file operations on files that use unix line endings.

I discovered this by using gulp-decomment on a file with unix style line endings on windows. The result was a completely blank file.

Maybe use a line ending regex and the exec function?

var EOLRegExp = /\r\n|\n|\r/gm;
var lbFind = EOLRegExp.exec(code);

This would allow support for mixed line endings and be OS independent.

vitaly-t commented 8 years ago

I don't see how trying to process lines on an OS for files with the wrong line breaks could ever work. You cannot make an assumption where it is the correct/complete line or not, especially when it comes to trimming lines, which the library does (option trim).

The OS-dependent way is the only one that's consistent. You need to either fix the files encoding, or not use this type of processing on files with the wrong line breaks.

I don't know what type of IDE you are using, but for example in WebStorm it changes the line ends automatically.

vitaly-t commented 8 years ago

The only way one could improve this library is to add support for overriding line ends, which would work in your case, but would need to be used with care.

UPDATE: I have marked it as an enhancement, to be able to override the line breaks.

vitaly-t commented 8 years ago

Option platform has been added in version 0.8, which allows for any kind of scenario regarding the line breaks.

itslenny commented 8 years ago

This is a bad assumption. line endings do not relate to the platform the code is being run on they relate to the platform on which the file was saved (maybe). Really the code should be agnostic to line endings which is what my pull request accomplished, but with your latest update it doesn't work because of the new tests you added that have hard coded line endings.

itslenny commented 8 years ago

The latest change just digs further into the hole of assuming the platform's line endings have bearing on the files line endings. If you roll back your latest change I will submit this pull request again.

You can review it here before you make a decision. I added one test which fails without my addition and passes with my modifications to the parse.js file.

https://github.com/itslenny/decomment/tree/SupportMixedLineEndings

vitaly-t commented 8 years ago

This is testing the mode that you would be using:

describe("auto", function () {
        it("must always ignore solo \\r symbols", function () {
            expect(decomment("//comment\r\rtext", {trim: true, platform: 'auto'})).toBe("");
            expect(decomment("/*comment*/\r\rtext", {trim: true, platform: 'auto'})).toBe("\r\rtext");
        });
        it("must determine Unix and break on \\n", function () {
            expect(decomment("//comment\n\ntext", {trim: true, platform: 'auto'})).toBe("text");
            expect(decomment("/*comment*/\n\ntext", {trim: true, platform: 'auto'})).toBe("text");
        });
        it("must determine Windows and break on \\r\\n", function () {
            expect(decomment("//comment\r\n\r\ntext", {trim: true, platform: 'auto'})).toBe("text");
            expect(decomment("/*comment*/\r\n\r\ntext", {trim: true, platform: 'auto'})).toBe("text");
        });
    });

And it is agnostic. When you are saying that it doesn't work - is that the result of your testing or an assumption?

Please show an example of text where it doesn't work. All my tests confirm the opposite so far, that it does work reliably.

itslenny commented 8 years ago

You're right. This works.

It's not nearly as flexible as my proposed solution, and it would still choke on a file with inconsistent line endings (although that is certainly an edge case).

vitaly-t commented 8 years ago

I understand you are talking about a hypothetical broken file that uses neither Unix no Windows line endings, rather a mix. But what are the chances of encountering such a file, plus it would already have all the problems on its own, perhaps this library should not be used as a tool for resolving such issues.

Also, my implementation is much simpler and has much higher performance ;)

vitaly-t commented 8 years ago

@itslenny I think this concludes it.

Thank you for bringing this up, the library is now much more flexible.

:+1: