Closed henryqdineen closed 9 years ago
Promise.map
does not guarantee execution order - use .each
or .reduce
if you need that
I understand that .map
should not guarantee execution order and I specifically want parallel execution but it would be nice if the mapped array was FIFO
@benjamingr Is it difficult to guarantee the execution order?
@DenisGorbachev if you need a guarantee of execution order use .each
that's practically what it does.
@benjamingr .each
doesn't support concurrency, does it?
@benjamingr I'm OK with unordered Promise resolution "end", but is it possible to get ordered Promise resolution "start"?
Why would you want that? That sounds like an very specific request to make.
@benjamingr I'm sending requests to 3DCart API with concurrency: 10
. Right now, the requests go like this:
{ params: { limit: 100, offset: 1, datestart: '05/23/2015' } }
{ params: { limit: 100, offset: 101, datestart: '05/23/2015' } }
{ params: { limit: 100, offset: 201, datestart: '05/23/2015' } }
{ params: { limit: 100, offset: 301, datestart: '05/23/2015' } }
{ params: { limit: 100, offset: 401, datestart: '05/23/2015' } }
{ params: { limit: 100, offset: 501, datestart: '05/23/2015' } }
{ params: { limit: 100, offset: 601, datestart: '05/23/2015' } }
{ params: { limit: 100, offset: 701, datestart: '05/23/2015' } }
{ params: { limit: 100, offset: 801, datestart: '05/23/2015' } }
{ params: { limit: 100, offset: 901, datestart: '05/23/2015' } }
{ params: { limit: 100, offset: 12501, datestart: '05/23/2015' } }
{ params: { limit: 100, offset: 12401, datestart: '05/23/2015' } }
{ params: { limit: 100, offset: 12301, datestart: '05/23/2015' } }
...
It doesn't lead to an error per se, but leads to wondering "Is there a bug in my code?" I guess @henryqdineen wondered the same, too :)
It doesn't lead to an error per se, but leads to wondering "Is there a bug in my code?"
@DenisGorbachev I had the same initial feeling when I saw some logs from my code running bluebird map with concurrency.
@benjamingr Could we improve the docs for .map()
to explicitly state that order is not guaranteed (if it's there, then I missed it) or is it an FP principle of map's in general, that they will be executed in any order?
@naartjie
I added an explicit note in the docs for .map
.
In general, something with a map
is called a functor, the guarantees a map function must hold are basically only that Promise.resolve([1,2,3]).map(x => x)
resolves to the same values, and that Promise.resolve([1,2,3]).map(f).map(g)
is the same as Promise.resolve([1,2,3]).map(x => g(f(x)))
.
Although technically, then
is more of a map than map
is in this case (well, and also a flatMap). Both are irrelevant since FP principles are less important than not being surprising to users. Usability is more important than purity. Although if you care about FP principles - then no, map
does not have an inherent order guarantee.
Thanks for the quick update @benjamingr. Just for reference I'm including a code snippet of the gotcha:
Promise.all(['a', 'b', 'c'])
.map(function(x) {
return Promise.delay(0).then(function() {
console.log(x)
})
}, {concurrency: 1})
Prints:
a
c
b
While {concurrency: Infinity}
intuitively prints:
a
b
c
In general, it is best not to cause side-effects inside a map
:)
@benjamingr @naartjie Correct me if I'm wrong, but it looks exactly like a usability issue :)
:+1:
In a language where unconstrained side effects are allowed, guarantees about execution order are a good idea. Infact the guarantee about (async) execution order is one of the better things that promises bring to the table.
I propose blueflock, a streams library designed to work well with bluebird, have support for proper concurrency limits and includes node streams interop ;)
I'd really like a parMap
and an ordered map
; or map
and an orderedMap
. The latter is effectively a reduce-to-new-array, although a smarter algorithm could still allow concurrency.
each
is ill-suited because it only introduces side-effectsreduce
is ill-suited because it adds an explicit/manual new-array-build step and does not consider cases of additional concurrencyThen again, maybe I should just go back to using one of the "stream" libraries..
@pnstickne uhh, map
is already a "parMap` and mapSeries is an ordered map...
+1
This is unintuitive and inconsistent with the behavior of ES6 map
. I understand that mapSeries
iterates in order, however that should be the default behavior of map
.
Also I think that the parMap
function should return a Set to make the un-ordered nature of the result clear, rather than waiting for the developer to (hopefully) discover it after testing. (Although the risk with an uncommonly named function is less than with a standard name like map
.)
Is there any chance ordered and concurrent(with optional limit) map will be supported by bluebird?
Ordered and concurrent makes no sense.
What if you have a map of promises like this: [A, B, C, D, E]
and you want to execute it in order with concurrency 3:
1) Executes first 3 promises concurrently A, B, C
2) When one of those promises finishes, it executes D
3) When another one finishes, it executes E
etc.
Right now Promise.map
does not guarantee any order, so it can actually starts with e.g. A, C, E
. But what if I want the execution order and also some concurrency? How should I do it with Bluebird? Thanks.
Give me a clear use case, why do you need to guarantee order but "not really"? Where is this required?
@benjamingr You have a list of image thumbnail URLs sorted by date, in order. You want to fetch and add them to the page in the exact same order, 4 at a time. Its beneficial for the first ones to load and appear first, as they will render at the top of the screen. You still want concurrency, to avoid loading them one by one.
This is a use case we have: we have a master project and many slave projects, which needs to be synchronized with master. Each synchronization master <-> slave takes 1 hour to finish and these synchronization does lots of REST calls and I/O read, write. And in slave projects, we have some that needs to be finish as soon as possible, so we put them in the first places in the array of projects.
Now we use promises to execute this synchronization. With Promise.map()
, it will pick projects in random order and executes them. But we want to have e.g. first 5 projects to be processed first and then the rest.
I know it's possible to do it with 2 different arrays of projects, one with prioritized projects and other with the rest. But I found it quite redundant to have twice the same code and also the performance would suffer as it would need first to finish first array before it can start with another.
Felt I had to chime in on this too and this seems to be the most recent related issue I can find. I've gone through a few of these issues now where @petkaantonov and @benjamingr have opposed any changes in favour of the alternative which loses concurrency completely.
For reference this is referring to the use of .map with concurrency (higher than 1) and an expected logical execution order (not resolution order!).
While the use cases are small they do exist. You don't always have to rely on the results of code for it to make sense to operate in a certain pattern. An example (as in my case) would be the real time display of results to the end user to show a form of progress. It looks confusing to show things completing (roughly) in order only to suddenly start going backwards; especially with a long sequence. The other example as those above have pointed out is essentially where those earlier in the array have a higher priority than those later. The penalty of switching to single concurrency is too high of a trade-off.
The only valid but strong argument I can surmise against this is whether changing .pop to .shift (#709,#454) is an acceptable performance penalty for such relatively few use cases (is it really that bad with V8 optimisations nowadays?). So I am going to ask the question of whether it can be put behind another options flag such as sequentialExecution?
The alternative as has been pointed out before is for someone to create another package that supplements this behaviour, it just feels so unwieldy for such a minuscule change. Not to mention maintaining it and promoting it enough to see adoption from those scratching their heads over this issue.
is it really that bad with V8 optimisations nowadays?
Optimizing shifting to pop at the v8 engine level is rocket science level optimization and not anywhere like what v8 does.
If you have a 10K items array, popping 10000 times takes about 10000 actions, shifting has to move the entire array which takes 10000+9999+9998....+1 = 10001 * 10000 / 2 - quadratic complexity.
@benjamingr How about preparing the internal array?
Array.prototype.concat(input.slice(0,options.concurrency),input.reverse(input.slice(options.concurrency)))
Wow, I cannot believe I spent an hour thinking why my array doesn't work properly in the order that I provided.
.map
with concurrency almost fits this perfect role of channels combined with async/await
keywords, except the order is strange and confusing.
example:
await Bluebird.map(rangeOfNumbers, processBlock, { concurrency: 3 });
Gives me a concurrency of 3 for I/O, DB, etc, which is perfect for my needs, except i need them to run in linear sequence to know a particular type of event that happened first. Afaik, there is no way to do that.
Thanks @spion throat
solves the problem. The order it returns is not perfect, but processing order is ideal.
If i get it right, cwait
does the opposite, no guarantee on processing but returns ordered. Made a little gist here
If you had a PR for it, is it a fn that would make sense in Bluebird?
Here's a quick and dirty solution:
const items = [1, 2, 3, 4, 5];
await Promise.map(new Array(items.length), async () => {
const item = items.shift();
//...code...
}, {
concurrency:2
});
I can't believe this never got fixed. It's super annoying. When the map has side-effects, you expect the side-effects to occur in a predictable order. All of this nonsense about "map does not guarantee order" is fine in a pure-functional language like Haskell, but not in javascript where we have side-effects. I was previously using "mapSeries" ( which does go in the right order -- no problem there ), and I just wanted to add some concurrency. There is no reason I should have to give up proper order of execution. In fact, I think "map with concurrency" is actually closer to "mapSeries()" than it is to normal "map()".
I can't believe this never got fixed. It's super annoying.
It never got fixed because we do not consider it a bug and do not want to encourage people to map
with side effects.
There are multiple suggested packages/things you can use that are listed in this issue in order to do side effects in an async function.
I still don't get the benefit of "not guaranteeing execution order" on .map
, and IMHO it's ridiculous to shuffle the array before execution.
Practical example: I currently process thousands of XHR requests with .map
to fill a data grid. I want to concurrently make the requests, but at the same time I want to make them in an order of importance so I see the important stuff first.
There is no guarantee that you will see the most important data first.
Lets say you use a concurrency of 20, but your first 15 items take forever to load. The mapping might continue until the end before those 15 items load using the 5 remaining slots.
Now in your case, this might not be a big deal - you want "best-effort" in-order delivery, not a guaranteed one. The problem is, some use cases require a guaranteed in-order delivery - if Bluebird processed things in order by default, it "looks" like the ordering is guaranteed.
Since the ordering seems to work, a user could easily write code that relies on the ordering... until one day it doesn't and the code that relied on it fails! Now they're stuck with a difficult to reproduce bug (because things usually run in the right order).
Because of the above, the implementation explicitly runs things out of order. Its our way of informing you early on that the guarantee is not there, before you encounter that 1 in 1000 times hard to reproduce bug.
We've thought about adding a guaranteed execution order implementation, but its probably far nicer to do that with streams, and streams are a big project which we've not had the time to tackle
you want "best-effort" in-order delivery, not a guaranteed one
Exactly
The problem is, some use cases require a guaranteed in-order delivery
I doubt anybody expects it here.
@butenkor do you mean to say that you doubt that any user of bluebird would ever expect guaranteed in-order delivery (if best-effort was implemented), or you doubt that anyone in this thread expects it? If its the second one, I can probably agree, but with the first one, definitely not.
expect guaranteed in-order delivery
This is definitely you could expect but probably not with concurrency. For that purpose one has mapSeries
or each
.
Looking at this thread it seems that it would be enough to provide analog to mapSeries
another map
function which applies "best-effort" on processing order while allowing concurrency.
@butenkor we'd prefer if possible to go in the opposite direction - we'd rather remove all these array utility functions instead of adding more and more, and implement a stream class instead...
@spion
Since the ordering seems to work, a user could easily write code that relies on the ordering... until one day it doesn't and the code that relied on it fails! Now they're stuck with a difficult to reproduce bug (because things usually run in the right order).
This is completely the user's fault for not reading the docs. There's a method called .each
or whatever for guaranteed delivery-order, and in this user's case it's the method to use. I just want guaranteed processing-order.
Because of the above, the implementation explicitly runs things out of order. Its our way of informing you early on that the guarantee is not there, before you encounter that 1 in 1000 times hard to reproduce bug.
It's really not "1 in 1000 times hard to reproduce". Anyone who knows the smallest bit about concurrency will spot that potential issue immediately.
It's just extremely weird that you shuffle the input in a .map
operation and not provide a non-shuffled version. Anyway, I'm using Promise.all
with throat
now.
const throat = require('throat')(require('bluebird'));
const stuff = ['a', 'b', 'c'];
const concurrency = 10;
const promise = Promise.all(stuff.map(throat(concurrency, x => new Promise(...))));
It's just extremely weird that you shuffle the input in a .map operation and not provide a non-shuffled version.
I know it looks weird - I'm just explaining that there is a good reason for it.
It's really not "1 in 1000 times hard to reproduce". Anyone who knows the smallest bit about concurrency will spot that potential issue immediately.
It really depends on a lot of factors. I've had that issue crop up several times. Never the first time writing the code, but sometimes when existing code is modified.
For example, you start with:
someArray.map(async (x) => {
enqueueSomething(x);
// sync and async code, maybe await
return someOperation(x)
})
Lets say enqueueSomething(x)
relies on it running in the right order for the queue to be correct. It looks like its not a big deal, you read the docs - you know you can take advantage of the in-order execution of the sync part.
The code goes through several modifications fine, getting some stuff above and below enqueueSomething(x).
At some point, another developer comes to modify that code to add some permission checks before enqueuing x. At that point, to them it may not be entirely obvious that you rely on this ordering. It would depend on their familiarity with the code, with the underlying libraries and with how many things they're able to juggle in their head at the same time. Lets say they make a change:
someArray.map(async (x) => {
// some sync code
let permission = await checkPermissionFor(x);
if (!permission) return Promise.resolve(SomeEnum.NotEnqueued);
// some more sync code
enqueueSomething(x);
// some sync or async code
return someOperation(x)
})
Now the order is wrong, and the program potentially breaks.
Bluebid has some extra issues with this. You might think - no problem, I know - I will just filter the things I have permission to enqueue! So you try to chain multiple array utility calls:
So if you have
Promise.filter(arr, x => {
return checkPermissionFor(x)
}).map(x => {
enqueueSomething(x);
})
The enqueueSomething
calls may run out of order, since bluebird immediately passes through the results that completed from the previous filter operation on to the map operation, effectively working as if you wrote:
arr.map(x => {
console.log(x)
return checkPermissionFor(x).then(y => { if (y) { enqueueSomething(x) } })
})
The technique is known as (or at least similar to) map fusion and is common in purely functional languages. For obvious reasons, it doesn't quite work well when you expect side effects to be ordered.
Now, its true that as a result we have nothing to address the issue where you:
Thats definitely annoying and in addition is a common use case for UIs. throat
is an awesome little library and I definitely recommend it over bluebird's utility functions for this (and many other) cases.
You cannot take such a "functional" viewpoint on this. This is not Haskell, it's Javascript, and there are side-effects, whether you like it or not. People use side-effects, whether you like it or not. It's a perfectly valid ( even idiomatic ) way to program in javascript.
You should also not be concerned with whether we might make silly mistakes like assuming the order of async results ( your "check permission" example ) -- again, not your problem!. That's just async javascript -- we get it. You don't need to protect me from that mistake.
All you have to to is BEGIN each item in-order. There is absolutely no reason why you cannot do this. You could even guarantee to begin each item in-order. ( because javascript is single-threaded )
The point is: there is absolutely no reason why Promise.map cannot BEGIN each item in-order. In fact, IMHO, it makes no sense to do it otherwise. This whole argument is completely ridiculous!
I guess I will also have to reach for "throat" to solve this problem. ( if they can do it, why can't you? )
I'm confused, you say:
You don't need to protect me from that mistake.
And then:
All you have to to is BEGIN each item in-order.
Which contradicts. When you .map
something items are implied to be independent and execution order shouldn't matter.
While technically when you .map
an array in JavaScript you may mutate the original array:
[1,2].map((x, i, arr) => { // 1,1
if (i == 0) {
arr[i+1]-=1;
}
return x;
});
I think that is bad code and we shouldn't support it. You are of course welcome to use throat, it is not a competition :]
@nickperkinslondon yes, do use throat, please.
The bluebird implementations work very differently internally to offer maximum parallelism when you chain multiple map calls. If you don't care about that it's better not to use it at all.
I got something like this to work for me:
async function* map<T, R>(inputs: T[], func: (input: T) => Promise<R>, concurrency: number): AsyncGenerator<R> {
const promises: Array<Promise<R>> = [];
const resolves: Array<[T, (result: R) => void]> = [];
for (const input of inputs) {
promises.push(new Promise((resolve) => resolves.push([input, resolve])));
}
resolves.reverse();
$Promise.map(
new Array(inputs.length),
async () => {
const [input, resolve] = resolves.pop();
const result = await func(input);
resolve(result);
},
{ concurrency },
);
for (const promise of promises) {
yield await promise;
}
}
This is an async generator that can be used like this:
for await (const result of map(arr, func, concurrency)) {
// do something with result
}
Also just to add to the discussion, once use case for this would be something like this:
I'm writing an API that fetches some paginated resource, and streams the results as they come in. I want to be able to write something like this:
const pages = [1, 2, 3, ..., N];
const results = Promise.map(pages, fetchPage);
for await (const result of results) {
res.write(result);
}
Obvoiusly Promise.map
can't be modified to support this because it's not an async generator. It would be nice, however, if bluebird could support this with some new function, though I'm not sure what it should be called.
@enricozb there is a built in map
in Node streams including async generators.
Readable.from(listOfThings).map(x => /* this can be an async function returning a promise */, { concurrency: 4 });
Which is also async iterable
@benjamingr The function you mentioned looks exactly like what I was looking for, thanks! It's a shame it's still unstable though.
@enricozb happy it's useful - if we change it it's only renaming to align with https://github.com/tc39/proposal-iterator-helpers it will not be removed :]
If you say you're leaving this feature out (or bug in, depending on your perspective), you should at least communicate that in the JSdoc. I know, we all want the library to operate exactly how we think it should and that isn't always the best way... but this really is an unexpected thing to be missing from the function. And it should be mentioned to keep folks from wasting their time on this. Thank you
For reference.... I'm not writing bad code. I just want my logs to be useful.
await Bluebird.map(itemsToDelete, async (item, i) => {
if (i % 100 === 0) {
console.log(`deleted ${i} so far`);
}
await delete(item);
},
{concurrency: 1}
);
// Output: deleted 900 so far deleted 800 so far deleted 700 so far ...hmm wrong direction
I am having a problem with the concurrency option for
Promise.map
. I'm my code I want the order of execution of promises to match the order of the input (in my case optimizing file reads from a CDROM)Consider this example:
The console output of this code will be
1 2 5 4 3