Closed jonathanong closed 5 years ago
Using generators/iterators in io.js where they make sense would be a good addition.
Does this depend on the work in joyent/libuv#1574 to be merged first? I don't think it's in libuv yet.
:+1:
in the future asynchronous version could return an observable:
for (dir on fs.readdir(__dirname)) {
}
:+1:
I'm all for the new named versions. We don't want to break compatibility with existing code by changing behaviour of existing functions, but new stuff could work. I'm not sure what the policy is yet on new interfaces that deviate from node.js though.
@iojs/tc Thoughts?
Where else does an interface like this make sense? Would this be a one-off of would there be a flood of requests to add the same style interface to other core APIs?
@rvagg I looked for any other APIs that would look like this but I couldn't find any. I think this may be a one-off.
I think the way forward on this is for someone here to propose a change in a PR and then we'll escalate that to a TC discussion because it would be good to cover the question of whether adopting a new style of API is desirable. If it's too much work to come up with an initial implementation then we could just elevate this issue, it'd just be more of a hypothetical discussion then.
I think the streams version makes sense. I would test the iterator design in userland though. It can apply to any sort of stream, really. And it's easy enough to abstract a stream as an iterator in userland.
The eventual observables thing sounds pretty rad.
@Qard how would you create an iterator from a stream? streams are async... iterators are not...
@cjb i don't think it's merged. i'm going to open an issue in libuv now
Just use co? (or something similar)
OK, a few things:
.throw()
and .return()
methods; they don't make any sense either.)for
-on
syntax. I doubt that will make the ES2016 train since no implementers have yet expressed interest, but I could be wrong. Regardless, it's very tentative. (There's another competing proposal, async iterables, that might be a better fit for this use case in particular.)So I don't think this should be a new style of API. Just an object mode stream is fine.
Whoa, yeah this is a totally bogus suggestion as iterators and generators can't be used for iterating over async. This tripped me up at first as well.
But note that though it's not exactly what you had in mind fs.readFileSync(file, 'utf8')
is already iterable by way of a String being iterable, and soon fs.readFileSync(file)
will too be iterable as Buffers implement the iterable interface: https://github.com/iojs/io.js/pull/525
So re-reading the OP: readdirSync already returns an array which is iterable
It does, but readdirSync() is (too) eager: it slurps the directory in one pass. If you apply it to a directory with 1M+ entries, it will likely bring down the process.
An iterator that walks the entries piecemeal would be great but it requires a number of changes to libuv. @misterdjules already did some of the prep work but I recall that there were still some loose ends. I'm not even sure if we reached consensus that his approach was really the best way forward.
as iterators and generators can't be used for iterating over async
they can!
for (let promise of iterator) {
let val = await promise;
}
@bnoordhuis I see, so you'd block on each call to .next()
, instead of all at once.
@vkurchatkin yes, you can invent contracts on top of iterators. Especially ones that only work in ES2016. But they won't necessarily be respected. Someone could "forget" an await
, or a .then
, or just call .next()
twice in a row. Now what? How do you know if it's end-of-iterable or I/O error or ...?
@vkurchatkin essentially you are proposing async iterable as next() : -> Iteration<Promise<T>>
, whereas a more coherent design is next(): -> Promise<Iteration<T>>
.
For example in your example the producer can't decide when to return or throw from the generator side, because the producer has to wait for the async process to complete before doing so, but the only way to signal completion or errors is synchronous return/throw.
you'd block on each call to .next(), instead of all at once.
Yes, but I'm mostly concerned with memory usage; an iterator brings it down from O(n) to O(1). Currently, trying to scan a large directory exhausts the heap and terminates the process.
@domenic agreed, ugly and far from ideal.
a more coherent design is next(): -> Promise<Iteration
>
I can't see how this can be achieved using existing ES2015/2016 primitives. Thanks for the link, probably it has explanation.
Yeah the iterative version in my head was a synchronous version that doesn't load the entire array in memory at once. But i care way more about the async version.
@domenic While generators on their own are sync, they can be used in an async way via co. Maybe you don't like the weird promise trampoline hack it uses, but it works and many people are already using it in that way. I suggested using co because I had already suggested doing the iterator thing in userland, which means those sort of tools are available. If you prefer some different async iterator module, you can write a streams abstraction on that.
I wouldn't mind this being in core, I just suggested a userland approach because I expected some members of the core team might disagree.
btw let's not do the iterator design. i realized that if something like this happens:
var i = 0;
for (name of fs.readdirIter(__dirname)) {
if (i++ > 5) throw new Error('boom');
}
assuming fs.readdirIter()
creates a file descriptor, the above may cause a file descriptor leak as the iterator may remain in a paused state. this is assuming fs.readdirIter()
opens a file descriptor until it's complete. having users manually close the file descriptor may be too much to ask (not worth people complaining about leaks when they don't know about this).
so... let's just do the stream version for now :D
@jonathanong as long as the iterator implements iterator.return()
that closes the FD, there will be no leak.
@domenic why isn't return
here: https://people.mozilla.org/~jorendorff/es6-draft.html#sec-iterator-interface
@vkurchatkin same reason throw
isn't.
not working:
var arr = [1,2,3,4,5];
var it = arr[Symbol.iterator]();
it.return = function() {
cobsole.log('return');
}
for (var v of it) {
if (v > 3) break;
console.log(v)
}
aha, here it is: https://people.mozilla.org/~jorendorff/es6-draft.html#sec-iteratorclose
Still don't understand why it is not a part of iterator
interface.
Interesting. Good to know we have an es master here!
plain node stream style looks good too:
var stream = fs.readdirStream(__dirname);
stream.on('data', function(dir) { ... })
is it possible to create iterable version on top of it?
+1 to an object stream for this, that could probably be shoehorned into an iterator (?)
So backing up to the object mode stream for fs.readDir (e.g. fs.createReadDirStream ?) would it make sense for there to be a deep/recursive mode for it (e.g. similar to what I made with spider-stream) at the moment doing this is tricky as you either have to stat all the things in the directory or do what rimraf doesn't do which would be assume everything is a directory and on error know it's a file (what rimraf does do doesn't really apply here).
Sounds like something for userland still. Perhaps made easier / safer by a streaming readdir
Perhaps made easier / safer by a streaming readdir
that is what I was going for
Looks like there might be some more movement on this: https://github.com/libuv/libuv/pull/416
To my understanding, it will require a major bump of libuv, so this might take a while to actually come to io.js / node.
With libuv/libuv#416 we could simply expose an API like this:
let dir = await fs.opendir("foo"), entries;
do {
entries = await dir.read(100)
// deal with entries
} while (entries.length === 100);
// when dir is finalized it calls closedir
Or with simple callbacks, if promises are too crazy.
FTR, Go's readdir requires you to limit the number of entries read. By default it's 100.
Also it hasn't been discussed much here, but not limiting readdir is a security risk — allows virtually unlimited memory allocation initiated by user input. I.e. a file server implementation (or a web server, because everyone seem to like web servers) that allow uploading/adding files and listing files could be DoS'd by creating many files and list the entries.
when dir is finalized it calls closedir
Not this. JS doesn't have finalization. It has garbage collection, which is never guaranteed to run. Scarce system resources like directory handles should never depend on garbage collection to collect them.
The stream design seems sane. Or yours, but with dir.close()
. Alternately a lower-level API like const d = await fs.opendir(dir); const entry = await fs.readdir(d); fs.closedir(d)
would be reasonable for core IMO.
Also it hasn't been discussed much here, but not limiting readdir is a security risk — allows virtually unlimited memory allocation initiated by user input. I.e. a file server implementation (or a web server, because everyone seem to like web servers) that allow uploading/adding files and listing files could be DoS'd by creating many files and list the entries.
This shouldn't be an issue with the streaming proposal, but should be with the current implementations I think.
@domenic I'm well aware how v8 works ;)
(BTW, object streams seems to be favoured by individuals who tend to seek out opportunities to create new abstractions — it doesn't add clarity and suffers in performance IMHO.)
TL;DR: async bridge toll costs would make reading single dirents unfeasible.
libuv/libuv#416 exposes the following interface:
int uv_fs_opendir(uv_loop_t*, uv_fs_t*, const char* path, uv_fs_cb);
int uv_fs_readdir(uv_loop_t*, uv_fs_t*, const uv_dir_t*, uv_dirent_t[], size_t limit, uv_fs_cb);
int uv_fs_closedir(uv_loop_t*, uv_fs_t*, const uv_dir_t*, uv_fs_cb);
Where the readdir call reads up to limit
amount of entries. There's a bridge toll for any uv async call (which can be quite dramatic if you perform many calls, like i.e. a compiler would do.) It makes sense to read a sensible amount of dirents per uv request, to avoid paying high bridge tolls.
let fs = require("fs");
fs.openDir("foo", function (err, dir) {
if (err) { throw err; }
let readNext = function() {
dir.read(100, function (err, entries) {
if (err) { throw err; }
// deal with entries
if (entries.length === 100) {
readNext();
} else {
dir.close(function(err) {
if (err) { throw err; }
// done reading directory
});
}
});
}
readNext();
});
Addition to lib/fs.js:
let fsb = process.binding("fs")
export class Dir {
read(limit :number, cb :(err:Error,entries:DirEnt[])=>void) :void {}
close(cb: (err:Error)=>void) :void {}
}
interface DirEnt {
name :string;
type :string; // "dir"|"file"|"link"|"fifo"|"socket"|"char"|"block"|"?"
}
export function openDir(path :string, cb :(err:Error,dir:Dir)=>void) {
fsb.opendir(path, (err, handle) => {
if (!err) {
let dir = new Dir;
dir._handle = handle;
}
cb(err, dir);
})
}
Dir.prototype.read = function(limit, cb) {
if (!limit || limit < 1) { limit = 100 }
fsb.readdir(this._handle, limit, (err, entries) => {
if (err) {
fsb.closedir(this._handle)
this._handle = null
}
cb(err, entries);
})
}
Dir.prototype.close = function(cb) {
if (!this._handle) { throw new Error('already closed') }
fsb.closedir(this._handle, cb)
this._handle = null;
}
let fs = require("fs");
let dir = await fs.openDir("foo"), entries;
do {
entries = await dir.read(100)
// deal with entries
} while (entries.length === 100);
await dir.close();
// done reading directory
Addition to lib/fs.js:
let fsb = process.binding("fs")
export class Dir {
async read(limit :number) :DirEnt[] {}
async close() :void {}
}
interface DirEnt {
name :string;
type :string; // "dir"|"file"|"link"|"fifo"|"socket"|"char"|"block"|"?"
}
export function openDir(path :string) :Promise<Dir> {
return new Promise((resolve, reject) => {
fsb.opendir(path, (err, handle) => {
if (err) {
reject(err);
} else {
let dir = new Dir;
dir._handle = handle;
resolve(dir);
}
})
})
}
Dir.prototype.read = function(limit) {
if (!limit || limit < 1) { limit = 100 }
return new Promise((resolve, reject) => {
fsb.readdir(this._handle, limit, (err, entries) => {
if (err) {
fsb.closedir(this._handle)
this._handle = null
reject(err);
} else {
resolve(entries);
}
})
})
}
Dir.prototype.close = function() {
return new Promise((resolve, reject) => {
if (!this._handle) { throw new Error('already closed') }
fsb.closedir(this._handle, err => {
if (err) { reject(err); } else { resolve(); }
})
this._handle = null;
})
}
How are things going with this?
@hollowdoor This is dependent on this libuv PR: https://github.com/libuv/libuv/pull/416
This has been a while and will probably be a while longer. From https://github.com/libuv/libuv/pull/416#issuecomment-218483712:
There is no defined roadmap as of yet. v2 won't happen overnight, that's for sure. Time is a scarce resource, I'm afraid. Personally I haven't had the time to work on any of the things I wanted for v2 yet.
So for this to happen, if I understand everything correctly:
I'm not sure if this is a good use of the stalled
label or not, but I'll just leave this comment for anyone curious if there's been progress since October...
Any progress on this?
@DomVinyard It would appear no. If you want to see this happen, it requires https://github.com/libuv/libuv/pull/416 so maybe head over there to see if there's a way you can help. /cc @whitlockjc
@Trott Appreciate the fwd. For anybody else awaiting this to solve the same subset of problems as me - a combination of native readdir and stream iterator gets you some of the way.
Almost three months later, still waiting eagerly on this. native-readdir is a nice shim but it's not compatible with windows. Does anybody have any other solutions?
There's really nothing we can do on this until the appropriate support lands in libuv and that is updated here. Best would be to try to go shake the tree on that side.
I'll be picking this up tonight, I apologize for the delay. Job changes and such have pulled me away from Node.js for a few months. I'm back and will get libuv support in ASAP.
since we're in ES6 territory now, i'm thinking the sync version should be an iterable
and have the async version be an object stream:
See: https://github.com/joyent/node/issues/388