Closed sttk closed 1 year ago
@sttk if this is the full example, you've forgotten to create a Writeable stream, so it never sinks and you hit the highWaterMark
Nvm, I see the writeable stream is fs
However, I'm very confused because I was 99% sure that you aren't allowed to use an object stream with fs writeStream. It's a text based stream.
@phated This cause is that glob
process multiple readdir only when a given glob string include globstar.
https://github.com/isaacs/node-glob/blob/3bfec21dd180ddf4672880176ad33af6296a167e/glob.js#L360
@phated
you aren't allowed to use an object stream with fs writeStream
In this Transform, I changed an object to a string.
In the above sample code, the Transform
convert an object to a string.
const ts = Transform({
objectMode: true,
transform: (f, enc, cb) => { ++count; cb(null, f.path + '\n'); },
});
The following is the logs of isGlobStar
here by the above sample code when a glob string is './in/*.txt'
:
$ node test.js
glob isGlobStar false
count = 16
$
And the logs when a glob string is './in/**/*.txt'
:
$ node test.js
glob isGlobStar true
glob isGlobStar false
glob isGlobStar false
glob isGlobStar true
glob isGlobStar false
...
glob isGlobStar false
glob isGlobStar true
count = 100
$
I noticed this issue when I run the test of vinyl-fs
. This test failed.
https://github.com/isaacs/node-glob/blob/3bfec21dd180ddf4672880176ad33af6296a167e/glob.js#L412-L432
To skip this part lists all files even if a glob string includes no globstar.
The following code which uses GlobStream
in readable.js
directly works well.
$ cat <<EOF > test2.js
const { pipeline, Transform } = require('stream');
const fs = require('fs');
const GlobStream = require('glob-stream/readable');
const gs = GlobStream('./in/*.txt', []);
let count = 0;
const ts = Transform({
objectMode: true,
transform: (f, enc, cb) => { ++count; cb(null, f.path + '\n'); },
});
const ws = fs.createWriteStream('./out/out.txt');
pipeline(gs, ts, ws, () => console.log('count =', count));
EOF
$ node test2.js
count = 100
The following code which also uses pumpify
and unique-stream
like index.js
works well.
$ cat <<EOF > test3.js
const { pipeline, Transform } = require('stream');
const fs = require('fs');
const pumpify = require('pumpify');
const GlobStream = require('glob-stream/readable');
const UniqueStream = require('unique-stream');
const gs = GlobStream('./in/*.txt', []);
const uniq = UniqueStream();
const pump = pumpify.obj(gs, uniq);
let count = 0;
const ts = Transform({
objectMode: true,
transform: (f, enc, cb) => { ++count; cb(null, f.path + '\n'); },
});
const ws = fs.createWriteStream('./out/out.txt');
pipeline(pump, ts, ws, () => console.log('count =', count));
EOF
$ node test3.js
count = 100
And the following code which also uses ordered-read-stream
like index.js
doesn't work well.
$ cat <<EOF > test4.js
const { pipeline, Transform } = require('stream');
const fs = require('fs');
const pumpify = require('pumpify');
const GlobStream = require('glob-stream/readable');
const UniqueStream = require('unique-stream');
const Combine = require('ordered-read-streams');
const gs = GlobStream('./in/*.txt', []);
const agg = new Combine(gs);
const uniq = UniqueStream();
const pump = pumpify.obj(agg, uniq);
let count = 0;
const ts = Transform({
objectMode: true,
transform: (f, enc, cb) => { ++count; cb(null, f.path + '\n'); },
});
const ws = fs.createWriteStream('./out/out.txt');
pipeline(pump, ts, ws, () => console.log('count =', count));
EOF
$ node test4.js
count = 16
@phated I suspect that this issue is due to ordered-read-streams
from the above results.
This issue is not due to ordered-read-streams
.
The cause is seemed that glob-stream/readable
uses highWaterMark
as limit of read count.
$ cat <<EOF > test3.js
const fs = require('fs');
const { Readable, pipeline } = require('stream');
const Combine = require('ordered-read-streams');
let cnt = 1;
let max = 100;
let read_count = 0;
const rs = new Readable({
highWaterMark: 20,
objectMode: true,
read(size) {
read_count++;
for (i = 0; i < size; i++, cnt++) {
if (cnt > max) {
this.push(null);
break;
}
this.push(`${cnt}\n`);
}
},
});
const cs = new Combine([rs]);
let drain_count = 0;
const ws = fs.createWriteStream('out.txt', { highWaterMark: 10 });
ws.on('drain', () => {
drain_count++;
});
ws.on('finish', () => {
console.log(`read_count = ${read_count}`);
console.log(`drain_count = ${drain_count}`);
});
pipeline(cs, ws);
EOF
$ node test5.js
read_count = 6
drain_count = 24
$ wc -l out.txt
100 out.txt
$
The cause is seemed that glob-stream/readable uses highWaterMark as limit of read count.
That means you aren't draining the streams correctly.
I'm having a similar issue when trying to create a simple writable stream with a listener to the 'data' event. Only 16 results are read before the process completes. Could you provide a simple example where a large directory is globbed, console.info
is called for each data
event while listening to all of the files that should be returned by the glob?
Here is an example where only the first 16 data events are called.
// 16 results
glob_stream(join(process.cwd(), '*'))
.on('data', data=>console.info(data))
.on('end', ()=>console.info('done'))
// 16 results
glob_stream(join(process.cwd(), '*')).pipe(new Writable({
objectMode: true,
write(chunk, _encoding, next) {
console.debug(chunk)
next()
}
}))
I also tried using the mississippi
library with .pipe
, as you do in your tests & cannot figure out how to have the entire glob complete.
Even piping to process.stdout throws an error since this library streams objects. This seems very basic, but I can't find any examples of consuming streaming objects properly for any library. Hopefully this thread to serve posterity by resulting with a simple example of properly consuming an object stream.
// TypeError [ERR_INVALID_ARG_TYPE]: The "chunk" argument must be of type string or an instance of Buffer or Uint8Array. Received an instance of Object
glob_stream(join(dir_path, '*')).pipe(process.stdout)
Note that spawn
& readline
work as expected.
import { spawn } from 'child_process'
import * as readline from 'readline'
// All of the files in `dir_path` are printed to the console
readline.createInterface({ input: spawn('ls', ['-1'], { cwd: dir_path }).stdout })
.on('line', $=>console.debug($))
I've added 2 tests for this at https://github.com/gulpjs/glob-stream/pull/119/commits/790c0cb5f1cd81ace1024b2cf0d41756c875db4c
What were you expecting to happen?
Lists more than 16 files in a directory.
What actually happened?
Lists only 16 files with a glob including no globstar.
Please give us a sample
Terminal output / screenshots
Please provide the following information:
node -v
): v16.15.0, v10.23.0npm -v
): 8.5.5Additional information
When replaces the above glob string to
'./in/**/*.txt'
, the result was as follows: