gulpjs / vinyl-fs

Vinyl adapter for the file system.
MIT License
966 stars 157 forks source link

Reconsider utf8 as default encoding #355

Open twolfson opened 2 months ago

twolfson commented 2 months ago

I was very surprised to encounter a bug report on gulp.spritesmith not working with Gulp 5.0.0 today.

https://github.com/twolfson/gulp.spritesmith/issues/159

After some sleuthing, we found it was tied to both src() and dest() assuming a default encoding of utf8 instead of none

I was quite shocked to find the source of this idea was from 2014 (10 years ago)

and there's nothing in the discussions about implications for how Gulp is being used currently, and what would break downstream as a result

https://github.com/gulpjs/vinyl-fs/issues/23

I appreciate the sentiment, assuming that Gulp is prob for text-based files, but that's simply not the only case and far from the default now

This decision broke for gulp.spritesmith both in the standalone image stream case, as well as common dest stream which outputs both a CSS spritesheet as well as the corresponding image

I've seen lots of other image based libraries have similar issues

Adoption for Gulp@5 seems quite low compared to Gulp@4 (200K downloads in past week vs 3.4M), so I encourage reconsidering reverting the default encoding, to avoid upending more downstream maintainers by surprise (or worse, getting more broken unmaintained libraries)

phated commented 2 months ago

We're considering it. Unfortunately, no one from the core team has time to work on anything because life has kicked us all while we're down... seems to be a common story in open source these days.

This will be a hard thing to change in a non-breaking way, since the change was literally marked as breaking.

twolfson commented 2 months ago

Yea, I hear that =/ =( Life goes on but our projects stay in place (I was lucky to have time to dig into fixing my repo as-is)

The simplest fix, albeit a very silly one, would be to re-release 4.0.2 at 6.0.0 (comparable to rolling back production to the latest known working version)

Then try again with the other fixes that rolled out in 5.0.0 as 7.0.0

Anyway, thanks for hearing me out, and best of luck with the decisions

I'll leave it up to you to mark this as closed or open

phated commented 2 months ago

Thanks for understanding.

We're not going to roll it back, but we can probably implement the default as a function that detects if content looks like utf8. This is similar (but not exactly) the same as the previous major.

erikkemperman commented 1 month ago

Hi, original author of the PR here, many years after the fact :-)

I am very sorry to hear that this change broke people's code. The choice of assuming utf-8 by default was simply based on the original (even older) issue #23, which you already linked above. Perhaps defaulting the new option to false would have been a safer bet, but text files were certainly far more common than binary...

On the other hand, this was clearly marked as a breaking change in a major release, and the instruction to set encoding:false for binary streams after upgrading to 5 was explicitly mentioned in the release blog.

twolfson commented 1 month ago

I had to look up the original PR, https://github.com/gulpjs/vinyl-fs/pull/287, that was opened in 2017 and landed in 2020.

2017 seems totally reasonable, I was still releasing occasionally on gulp.spritesmith as well

I should clarify that my complaint was not about having breaking changes in a major release.

It was about having breaking changes in the ecosystem.

i.e. A new Gulp user comes along and tries to use Gulp@5 with a library that has documentation for Gulp@4 is going to have a bad time -- and probably blame the library maintainer, not Gulp@5.

For rollouts like this, some better experiences have been with deprecation messages as well as emailing maintainers or opening GitHub issues.

Maybe there's something I could've done as well -- e.g. peerDependencies but it's been so long -- I just remember it being a more frustrating than beneficial experience

erikkemperman commented 1 month ago

I completely get it -- there ought to be a "radical" level in semver sitting just above "major" -- resulting in a separate package name to make sure that people have to deliberately and explicitly effect the change on their end.

But I didn't (and don't) think this particular change would have warranted that, folks were willing to jump from 3 to 4 which was in many ways much more deeply structural. And this, or something like this, was sorely needed -- writing a gulp plugin for text files is just so much simpler if you can just assume utf-8 as default for the up- and downstreams.

And I'm also not sure that somehow announcing anything at the time would have been very helpful, it was always going to be measured in years rather then months before the next major release, given everyone's dayjobs. People tend to ignore and forget about warnings on a long horizon :-)

At any rate, hopefully people running into this find these pages and add the encoding: false option for binaries. I suppose we could say that undefined gets you an auto-detect pass but that's actually pretty difficult to get right. Explicit is better than implicit.

marksteward commented 1 month ago

Got hit by this because dependabot suggested it as a security fix. Might be nice to throw an error instead of silently replacing characters above 0x7f with U+FFFD?