petkaantonov / bluebird

:bird: :zap: Bluebird is a full featured promise library with unmatched performance.
http://bluebirdjs.com
MIT License
20.45k stars 2.33k forks source link

Promise.map concurrency execution order #708

Closed henryqdineen closed 9 years ago

henryqdineen commented 9 years ago

I am having a problem with the concurrency option for Promise.map. I'm my code I want the order of execution of promises to match the order of the input (in my case optimizing file reads from a CDROM)

Consider this example:

Promise.map([1, 2, 3, 4, 5], function(val) {
    return new Promise.delay(100).then(function() {
        console.log(val);
    });
}, {concurrency: 2});

The console output of this code will be 1 2 5 4 3

benjamingr commented 9 years ago

Promise.map does not guarantee execution order - use .each or .reduce if you need that

henryqdineen commented 9 years ago

I understand that .map should not guarantee execution order and I specifically want parallel execution but it would be nice if the mapped array was FIFO

DenisGorbachev commented 9 years ago

@benjamingr Is it difficult to guarantee the execution order?

benjamingr commented 9 years ago

@DenisGorbachev if you need a guarantee of execution order use .each that's practically what it does.

DenisGorbachev commented 9 years ago

@benjamingr .each doesn't support concurrency, does it?

DenisGorbachev commented 9 years ago

@benjamingr I'm OK with unordered Promise resolution "end", but is it possible to get ordered Promise resolution "start"?

benjamingr commented 9 years ago

Why would you want that? That sounds like an very specific request to make.

DenisGorbachev commented 9 years ago

@benjamingr I'm sending requests to 3DCart API with concurrency: 10. Right now, the requests go like this:

{ params: { limit: 100, offset: 1, datestart: '05/23/2015' } }
{ params: { limit: 100, offset: 101, datestart: '05/23/2015' } }
{ params: { limit: 100, offset: 201, datestart: '05/23/2015' } }
{ params: { limit: 100, offset: 301, datestart: '05/23/2015' } }
{ params: { limit: 100, offset: 401, datestart: '05/23/2015' } }
{ params: { limit: 100, offset: 501, datestart: '05/23/2015' } }
{ params: { limit: 100, offset: 601, datestart: '05/23/2015' } }
{ params: { limit: 100, offset: 701, datestart: '05/23/2015' } }
{ params: { limit: 100, offset: 801, datestart: '05/23/2015' } }
{ params: { limit: 100, offset: 901, datestart: '05/23/2015' } }
{ params: { limit: 100, offset: 12501, datestart: '05/23/2015' } }
{ params: { limit: 100, offset: 12401, datestart: '05/23/2015' } }
{ params: { limit: 100, offset: 12301, datestart: '05/23/2015' } }
...

It doesn't lead to an error per se, but leads to wondering "Is there a bug in my code?" I guess @henryqdineen wondered the same, too :)

naartjie commented 9 years ago

It doesn't lead to an error per se, but leads to wondering "Is there a bug in my code?"

@DenisGorbachev I had the same initial feeling when I saw some logs from my code running bluebird map with concurrency.

@benjamingr Could we improve the docs for .map() to explicitly state that order is not guaranteed (if it's there, then I missed it) or is it an FP principle of map's in general, that they will be executed in any order?

benjamingr commented 9 years ago

@naartjie

I added an explicit note in the docs for .map.

In general, something with a map is called a functor, the guarantees a map function must hold are basically only that Promise.resolve([1,2,3]).map(x => x) resolves to the same values, and that Promise.resolve([1,2,3]).map(f).map(g) is the same as Promise.resolve([1,2,3]).map(x => g(f(x))).

Although technically, then is more of a map than map is in this case (well, and also a flatMap). Both are irrelevant since FP principles are less important than not being surprising to users. Usability is more important than purity. Although if you care about FP principles - then no, map does not have an inherent order guarantee.

naartjie commented 9 years ago

Thanks for the quick update @benjamingr. Just for reference I'm including a code snippet of the gotcha:

Promise.all(['a', 'b', 'c'])
.map(function(x) {
  return Promise.delay(0).then(function() {
    console.log(x)
  })
}, {concurrency: 1})

Prints:

a
c
b

While {concurrency: Infinity} intuitively prints:

a
b
c
benjamingr commented 9 years ago

In general, it is best not to cause side-effects inside a map :)

DenisGorbachev commented 9 years ago

@benjamingr @naartjie Correct me if I'm wrong, but it looks exactly like a usability issue :)

spion commented 9 years ago

:+1:

In a language where unconstrained side effects are allowed, guarantees about execution order are a good idea. Infact the guarantee about (async) execution order is one of the better things that promises bring to the table.

I propose blueflock, a streams library designed to work well with bluebird, have support for proper concurrency limits and includes node streams interop ;)

pnstickne commented 8 years ago

I'd really like a parMap and an ordered map; or map and an orderedMap. The latter is effectively a reduce-to-new-array, although a smarter algorithm could still allow concurrency.

Then again, maybe I should just go back to using one of the "stream" libraries..

benjamingr commented 8 years ago

@pnstickne uhh, map is already a "parMap` and mapSeries is an ordered map...

madmod commented 8 years ago

+1

This is unintuitive and inconsistent with the behavior of ES6 map. I understand that mapSeries iterates in order, however that should be the default behavior of map.

Also I think that the parMap function should return a Set to make the un-ordered nature of the result clear, rather than waiting for the developer to (hopefully) discover it after testing. (Although the risk with an uncommonly named function is less than with a standard name like map.)

butenkor commented 7 years ago

Is there any chance ordered and concurrent(with optional limit) map will be supported by bluebird?

benjamingr commented 7 years ago

Ordered and concurrent makes no sense.

lojzatran commented 7 years ago

What if you have a map of promises like this: [A, B, C, D, E] and you want to execute it in order with concurrency 3:

1) Executes first 3 promises concurrently A, B, C 2) When one of those promises finishes, it executes D 3) When another one finishes, it executes E etc.

Right now Promise.map does not guarantee any order, so it can actually starts with e.g. A, C, E. But what if I want the execution order and also some concurrency? How should I do it with Bluebird? Thanks.

benjamingr commented 7 years ago

Give me a clear use case, why do you need to guarantee order but "not really"? Where is this required?

spion commented 7 years ago

@benjamingr You have a list of image thumbnail URLs sorted by date, in order. You want to fetch and add them to the page in the exact same order, 4 at a time. Its beneficial for the first ones to load and appear first, as they will render at the top of the screen. You still want concurrency, to avoid loading them one by one.

lojzatran commented 7 years ago

This is a use case we have: we have a master project and many slave projects, which needs to be synchronized with master. Each synchronization master <-> slave takes 1 hour to finish and these synchronization does lots of REST calls and I/O read, write. And in slave projects, we have some that needs to be finish as soon as possible, so we put them in the first places in the array of projects.

Now we use promises to execute this synchronization. With Promise.map(), it will pick projects in random order and executes them. But we want to have e.g. first 5 projects to be processed first and then the rest.

I know it's possible to do it with 2 different arrays of projects, one with prioritized projects and other with the rest. But I found it quite redundant to have twice the same code and also the performance would suffer as it would need first to finish first array before it can start with another.

jstans commented 7 years ago

Felt I had to chime in on this too and this seems to be the most recent related issue I can find. I've gone through a few of these issues now where @petkaantonov and @benjamingr have opposed any changes in favour of the alternative which loses concurrency completely.

For reference this is referring to the use of .map with concurrency (higher than 1) and an expected logical execution order (not resolution order!).

While the use cases are small they do exist. You don't always have to rely on the results of code for it to make sense to operate in a certain pattern. An example (as in my case) would be the real time display of results to the end user to show a form of progress. It looks confusing to show things completing (roughly) in order only to suddenly start going backwards; especially with a long sequence. The other example as those above have pointed out is essentially where those earlier in the array have a higher priority than those later. The penalty of switching to single concurrency is too high of a trade-off.

The only valid but strong argument I can surmise against this is whether changing .pop to .shift (#709,#454) is an acceptable performance penalty for such relatively few use cases (is it really that bad with V8 optimisations nowadays?). So I am going to ask the question of whether it can be put behind another options flag such as sequentialExecution?

The alternative as has been pointed out before is for someone to create another package that supplements this behaviour, it just feels so unwieldy for such a minuscule change. Not to mention maintaining it and promoting it enough to see adoption from those scratching their heads over this issue.

benjamingr commented 7 years ago

is it really that bad with V8 optimisations nowadays?

Optimizing shifting to pop at the v8 engine level is rocket science level optimization and not anywhere like what v8 does.

If you have a 10K items array, popping 10000 times takes about 10000 actions, shifting has to move the entire array which takes 10000+9999+9998....+1 = 10001 * 10000 / 2 - quadratic complexity.

jstans commented 7 years ago

@benjamingr How about preparing the internal array?

Array.prototype.concat(input.slice(0,options.concurrency),input.reverse(input.slice(options.concurrency)))
Aristona commented 7 years ago

Wow, I cannot believe I spent an hour thinking why my array doesn't work properly in the order that I provided.

flockonus commented 6 years ago

.map with concurrency almost fits this perfect role of channels combined with async/await keywords, except the order is strange and confusing.

example:

await Bluebird.map(rangeOfNumbers, processBlock, { concurrency: 3 });

Gives me a concurrency of 3 for I/O, DB, etc, which is perfect for my needs, except i need them to run in linear sequence to know a particular type of event that happened first. Afaik, there is no way to do that.

spion commented 6 years ago

Use https://github.com/ForbesLindesay/throat

flockonus commented 6 years ago

Thanks @spion throat solves the problem. The order it returns is not perfect, but processing order is ideal.

If i get it right, cwait does the opposite, no guarantee on processing but returns ordered. Made a little gist here

If you had a PR for it, is it a fn that would make sense in Bluebird?

rickyk586 commented 6 years ago

Here's a quick and dirty solution:

const items = [1, 2, 3, 4, 5];
await Promise.map(new Array(items.length), async () => {
    const item = items.shift();
    //...code...
}, {
    concurrency:2
});
nickperkinslondon commented 6 years ago

I can't believe this never got fixed. It's super annoying. When the map has side-effects, you expect the side-effects to occur in a predictable order. All of this nonsense about "map does not guarantee order" is fine in a pure-functional language like Haskell, but not in javascript where we have side-effects. I was previously using "mapSeries" ( which does go in the right order -- no problem there ), and I just wanted to add some concurrency. There is no reason I should have to give up proper order of execution. In fact, I think "map with concurrency" is actually closer to "mapSeries()" than it is to normal "map()".

benjamingr commented 6 years ago

I can't believe this never got fixed. It's super annoying.

It never got fixed because we do not consider it a bug and do not want to encourage people to map with side effects.

There are multiple suggested packages/things you can use that are listed in this issue in order to do side effects in an async function.

smddzcy commented 6 years ago

I still don't get the benefit of "not guaranteeing execution order" on .map, and IMHO it's ridiculous to shuffle the array before execution.

Practical example: I currently process thousands of XHR requests with .map to fill a data grid. I want to concurrently make the requests, but at the same time I want to make them in an order of importance so I see the important stuff first.

spion commented 6 years ago

There is no guarantee that you will see the most important data first.

Lets say you use a concurrency of 20, but your first 15 items take forever to load. The mapping might continue until the end before those 15 items load using the 5 remaining slots.

Now in your case, this might not be a big deal - you want "best-effort" in-order delivery, not a guaranteed one. The problem is, some use cases require a guaranteed in-order delivery - if Bluebird processed things in order by default, it "looks" like the ordering is guaranteed.

Since the ordering seems to work, a user could easily write code that relies on the ordering... until one day it doesn't and the code that relied on it fails! Now they're stuck with a difficult to reproduce bug (because things usually run in the right order).

Because of the above, the implementation explicitly runs things out of order. Its our way of informing you early on that the guarantee is not there, before you encounter that 1 in 1000 times hard to reproduce bug.

We've thought about adding a guaranteed execution order implementation, but its probably far nicer to do that with streams, and streams are a big project which we've not had the time to tackle

butenkor commented 6 years ago

you want "best-effort" in-order delivery, not a guaranteed one

Exactly

The problem is, some use cases require a guaranteed in-order delivery

I doubt anybody expects it here.

spion commented 6 years ago

@butenkor do you mean to say that you doubt that any user of bluebird would ever expect guaranteed in-order delivery (if best-effort was implemented), or you doubt that anyone in this thread expects it? If its the second one, I can probably agree, but with the first one, definitely not.

butenkor commented 6 years ago

expect guaranteed in-order delivery

This is definitely you could expect but probably not with concurrency. For that purpose one has mapSeries or each.

Looking at this thread it seems that it would be enough to provide analog to mapSeries another map function which applies "best-effort" on processing order while allowing concurrency.

spion commented 6 years ago

@butenkor we'd prefer if possible to go in the opposite direction - we'd rather remove all these array utility functions instead of adding more and more, and implement a stream class instead...

smddzcy commented 5 years ago

@spion

Since the ordering seems to work, a user could easily write code that relies on the ordering... until one day it doesn't and the code that relied on it fails! Now they're stuck with a difficult to reproduce bug (because things usually run in the right order).

This is completely the user's fault for not reading the docs. There's a method called .each or whatever for guaranteed delivery-order, and in this user's case it's the method to use. I just want guaranteed processing-order.

Because of the above, the implementation explicitly runs things out of order. Its our way of informing you early on that the guarantee is not there, before you encounter that 1 in 1000 times hard to reproduce bug.

It's really not "1 in 1000 times hard to reproduce". Anyone who knows the smallest bit about concurrency will spot that potential issue immediately.

It's just extremely weird that you shuffle the input in a .map operation and not provide a non-shuffled version. Anyway, I'm using Promise.all with throat now.

const throat = require('throat')(require('bluebird'));

const stuff = ['a', 'b', 'c'];
const concurrency = 10;
const promise = Promise.all(stuff.map(throat(concurrency, x => new Promise(...))));
spion commented 5 years ago

It's just extremely weird that you shuffle the input in a .map operation and not provide a non-shuffled version.

I know it looks weird - I'm just explaining that there is a good reason for it.

It's really not "1 in 1000 times hard to reproduce". Anyone who knows the smallest bit about concurrency will spot that potential issue immediately.

It really depends on a lot of factors. I've had that issue crop up several times. Never the first time writing the code, but sometimes when existing code is modified.

For example, you start with:

someArray.map(async (x) => { 
  enqueueSomething(x); 
  // sync and async code, maybe await
  return someOperation(x) 
})

Lets say enqueueSomething(x) relies on it running in the right order for the queue to be correct. It looks like its not a big deal, you read the docs - you know you can take advantage of the in-order execution of the sync part.

The code goes through several modifications fine, getting some stuff above and below enqueueSomething(x).

At some point, another developer comes to modify that code to add some permission checks before enqueuing x. At that point, to them it may not be entirely obvious that you rely on this ordering. It would depend on their familiarity with the code, with the underlying libraries and with how many things they're able to juggle in their head at the same time. Lets say they make a change:

someArray.map(async (x) => { 
  // some sync code
  let permission = await checkPermissionFor(x); 
  if (!permission) return Promise.resolve(SomeEnum.NotEnqueued);
  // some more sync code
  enqueueSomething(x); 
  // some sync or async code
  return someOperation(x)
})

Now the order is wrong, and the program potentially breaks.

Bluebid has some extra issues with this. You might think - no problem, I know - I will just filter the things I have permission to enqueue! So you try to chain multiple array utility calls:

So if you have

Promise.filter(arr, x => {
  return checkPermissionFor(x)
}).map(x => {
  enqueueSomething(x);
})

The enqueueSomething calls may run out of order, since bluebird immediately passes through the results that completed from the previous filter operation on to the map operation, effectively working as if you wrote:

arr.map(x => {
  console.log(x)
  return checkPermissionFor(x).then(y => { if (y) { enqueueSomething(x) } })
})

The technique is known as (or at least similar to) map fusion and is common in purely functional languages. For obvious reasons, it doesn't quite work well when you expect side effects to be ordered.

Now, its true that as a result we have nothing to address the issue where you:

Thats definitely annoying and in addition is a common use case for UIs. throat is an awesome little library and I definitely recommend it over bluebird's utility functions for this (and many other) cases.

nickperkinslondon commented 4 years ago

You cannot take such a "functional" viewpoint on this. This is not Haskell, it's Javascript, and there are side-effects, whether you like it or not. People use side-effects, whether you like it or not. It's a perfectly valid ( even idiomatic ) way to program in javascript.

You should also not be concerned with whether we might make silly mistakes like assuming the order of async results ( your "check permission" example ) -- again, not your problem!. That's just async javascript -- we get it. You don't need to protect me from that mistake.

All you have to to is BEGIN each item in-order. There is absolutely no reason why you cannot do this. You could even guarantee to begin each item in-order. ( because javascript is single-threaded )

The point is: there is absolutely no reason why Promise.map cannot BEGIN each item in-order. In fact, IMHO, it makes no sense to do it otherwise. This whole argument is completely ridiculous!

I guess I will also have to reach for "throat" to solve this problem. ( if they can do it, why can't you? )

benjamingr commented 4 years ago

I'm confused, you say:

You don't need to protect me from that mistake.

And then:

All you have to to is BEGIN each item in-order.

Which contradicts. When you .map something items are implied to be independent and execution order shouldn't matter.

While technically when you .map an array in JavaScript you may mutate the original array:

[1,2].map((x, i, arr) => { // 1,1
  if (i == 0) {
    arr[i+1]-=1;
  }
  return x;
});

I think that is bad code and we shouldn't support it. You are of course welcome to use throat, it is not a competition :]

spion commented 4 years ago

@nickperkinslondon yes, do use throat, please.

The bluebird implementations work very differently internally to offer maximum parallelism when you chain multiple map calls. If you don't care about that it's better not to use it at all.

enricozb commented 2 years ago

I got something like this to work for me:

async function* map<T, R>(inputs: T[], func: (input: T) => Promise<R>, concurrency: number): AsyncGenerator<R> {
  const promises: Array<Promise<R>> = [];
  const resolves: Array<[T, (result: R) => void]> = [];

  for (const input of inputs) {
    promises.push(new Promise((resolve) => resolves.push([input, resolve])));
  }

  resolves.reverse();

  $Promise.map(
    new Array(inputs.length),
    async () => {
      const [input, resolve] = resolves.pop();
      const result = await func(input);

      resolve(result);
    },
    { concurrency },
  );

  for (const promise of promises) {
    yield await promise;
  }
}

This is an async generator that can be used like this:

for await (const result of map(arr, func, concurrency)) {
  // do something with result
}
enricozb commented 2 years ago

Also just to add to the discussion, once use case for this would be something like this:

I'm writing an API that fetches some paginated resource, and streams the results as they come in. I want to be able to write something like this:

const pages = [1, 2, 3, ..., N];
const results = Promise.map(pages, fetchPage);

for await (const result of results) {
  res.write(result);
}

Obvoiusly Promise.map can't be modified to support this because it's not an async generator. It would be nice, however, if bluebird could support this with some new function, though I'm not sure what it should be called.

benjamingr commented 2 years ago

@enricozb there is a built in map in Node streams including async generators.

Readable.from(listOfThings).map(x => /* this can be an async function returning a promise */, { concurrency: 4 });

Which is also async iterable

enricozb commented 2 years ago

@benjamingr The function you mentioned looks exactly like what I was looking for, thanks! It's a shame it's still unstable though.

benjamingr commented 2 years ago

@enricozb happy it's useful - if we change it it's only renaming to align with https://github.com/tc39/proposal-iterator-helpers it will not be removed :]

brian-murphy commented 1 month ago

If you say you're leaving this feature out (or bug in, depending on your perspective), you should at least communicate that in the JSdoc. I know, we all want the library to operate exactly how we think it should and that isn't always the best way... but this really is an unexpected thing to be missing from the function. And it should be mentioned to keep folks from wasting their time on this. Thank you

For reference.... I'm not writing bad code. I just want my logs to be useful.

await Bluebird.map(itemsToDelete, async (item, i) => {
  if (i % 100 === 0) {
    console.log(`deleted ${i} so far`);
  }

  await delete(item);
}, 
{concurrency: 1}
);

// Output: deleted 900 so far deleted 800 so far deleted 700 so far ...hmm wrong direction