gulp-community / gulp-concat

Streaming concat middleware for gulp
MIT License
792 stars 126 forks source link

gulp-concat doesnt seem to support UTF 16 #101

Open billrawlinson opened 9 years ago

billrawlinson commented 9 years ago

When concating files which are UTF 16 Little Endian (unicode) every other file gets munged a bit.

When concatenating files which are UTF 16 Big Endian the same result happens.

If you alternate files where the first is UTF16LE and the second is UTF16BE then just the very end of the second file gets munged.

I have set up a demo project that illustrates this and has a bunch of notes that explain why I even tried these things. I don't know for certain the problem is in gulp-concat (it could be in gulp itself in gulp.src(). )

https://github.com/finalcut/gulp-concat-bug

yocontra commented 9 years ago

Is this with concat or gulp itself?

billrawlinson commented 9 years ago

it seems like it is concat to me considering the characters that are munged are interleaved (every other file). I figured I'd post the problem here first and see if you guys could see it and, possibly, confirm or reject if it is with gulp-concat.

yocontra commented 9 years ago

@billrawlinson Can you try just piping src to dest a bunch of times and see if that causes the issue as well?

billrawlinson commented 9 years ago

Sure I'll try Monday. I'm on the road now. If anyone else wants to know sooner they can pull the demo project and try.

I figure the problem is either in the file read or Concat as the problem manifests in the middle of the concatted result which should rule out the write operation

On Fri, Jul 10, 2015, 15:28 contra notifications@github.com wrote:

@billrawlinson https://github.com/billrawlinson Can you try just piping src to dest a bunch of times and see if that causes the issue as well?

— Reply to this email directly or view it on GitHub https://github.com/wearefractal/gulp-concat/issues/101#issuecomment-120502422 .

billrawlinson commented 9 years ago

So I ran the tests where I just pipe in the files to dest and nothing funky happens to the files in the process.

I've updated the test demo project to where it does both.

If you want to run the tests to see the results just pull the project and give it a run. Each test now puts its results in a folder titled "results#' where # is the number of the test being run.

https://github.com/finalcut/gulp-concat-bug

yocontra commented 9 years ago

I'm guessing it has something to do with buffer conversions in concat-with-sourcemaps:

Probably mixing a bunch of encodings together using node's Buffer module is causing unexpected results.

billrawlinson commented 9 years ago

In test example 2 (utf16le) and 3 (utf16be) the encodings are all the same. Test 1 and 4 with mixed encodings ends up with better results (though still broken). Test 5,utf8,is the only one that has the correct results.

On Mon, Jul 13, 2015, 18:13 contra notifications@github.com wrote:

I'm guessing it has something to do with buffer conversions in concat-with-sourcemaps:

- https://github.com/floridoo/concat-with-sourcemaps/blob/master/index.js#L109

https://github.com/floridoo/concat-with-sourcemaps/blob/master/index.js#L43-L46

https://github.com/floridoo/concat-with-sourcemaps/blob/master/index.js#L15-L18

Probably mixing a bunch of encodings together using node's Buffer module is causing unexpected results.

— Reply to this email directly or view it on GitHub https://github.com/wearefractal/gulp-concat/issues/101#issuecomment-121077591 .

yocontra commented 9 years ago

@billrawlinson I mean that the separator is treated as UTF-8, so combining that with some UTF-16 buffers might be yielding weird results

billrawlinson commented 9 years ago

ah, that makes perfect sense.

billrawlinson commented 9 years ago

I assume, due to the nature of gulp pipes that concat has no way of knowing the encoding of the various buffers coming in to it from src?

billrawlinson commented 9 years ago

you are correct; it is the separator character that is causing the problem. I set up the test like follows:

function runConcatTest(d){
  var testResults =  gulp.src(d.sources)
    .pipe(concat(d.outfile, { newLine: '' }))
    .pipe(gulp.dest(d.outpath));
    testResults.on('data', printToConsole);
}

Where I basically blanked out the newLine character and the test 2 and 3 both work perfectly while test1 and 4 are all mucked up. If I don't override the newline it is broken as before.

Maybe as a temporary solution just the readme could be updated to let people know if they joining UTF16 files that they should put their own newline at the end of the files and then override the join character to be nothing.

UPDATE: I updated the demo project to show the working scenario with test 2 using an empty string as a the separator.

yocontra commented 9 years ago

@billrawlinson Hmm trying to think up a solution here, going to dig into the buffer docs and see if I can figure something out

yocontra commented 9 years ago

https://nodejs.org/api/buffer.html#buffer_class_method_buffer_isencoding_encoding

could emit a warning if the users mixes encodings (assuming we can't figure out a way to make it work)

yocontra commented 8 years ago

I played around with this for a bit and it stumped me, @billrawlinson did you figure anything out?

billrawlinson commented 8 years ago

I did not. I just resorted to not using UTF 16 :-1:

troydemonbreun commented 8 years ago

Have run into the same issue and it turns out the files that end up munged are UTF16