mollie / mollie-api-node

Official Mollie API client for Node
http://www.mollie.com
BSD 3-Clause "New" or "Revised" License
231 stars 63 forks source link

Discourage developers from requesting large sets of data #271

Closed Pimm closed 2 years ago

Pimm commented 2 years ago

Background

Since version 2.0.0 of this library, calling a paginated endpoint returns an array with added nextPageCursor and previousPageCursor properties. These properties are great for implementing a paginated view.

The array also has added nextPage and previousPage methods (similar to next and previous in the PHP client). The only purpose of these methods is to request large sets of data ‒ sets which span across multiple pages.

const firstPage = await mollieClient.payments.page();
const secondPage = await firstPage.nextPage();
const thirdPage = await secondPage.nextPage();

Why would a developer request large sets of data? They might need filtered data (all Italian customers); or they might be looking for a specific value but don't know the ID (getting a payment by its foreign ID in the metadata).

Ideally, the Mollie API would cover these cases. The clients would hand off the filtering/searching work to the Mollie API, which would send back the result. However, the Mollie API currently doesn't do any filtering or searching (not even trivial examples), and besides that it might never cover all use cases.

In version 3.6.0 of this library, we introduced the iteration API (iterate method), which is essentially the JavaScript-native alternative for nextPage (and previousPage).

Usage:

// Find the 10 latest euro payments over €100.00.
const iterator = mollieClient.payments.iterate()
    .filter(({ amount }) => amount.currency == 'EUR' && parseFloat(amount.value) > 100)
    .take(10);
for await (let payment of iterator) {
    // Apply logic.
}
// List all Italian customers.
const iterator = mollieClient.customers.iterate()
    .filter(({ locale }) => locale == 'it_IT');
for await (let customer of iterator) {
    // Apply logic.
}
// Find out whether a certain customer has spent more than €1000 in a single payment.
const bigSpender = await mollieClient.customerPayments.iterate({ customerId: 'cst_test' })
    .some(({ amount }) => amount.currency == 'EUR' && parseFloat(amount.value) > 1000);

The problem

Mollie has experienced incidents where developers were making so many requests in rapid succession that the API was having performance and reliability issues.

Developers can write this horrible code since 2018:

// ❌ Bad! Never do this.
var payments = await mollieClient.payments.page();
do {
    console.log(payments);
    payments = await payments.nextPage?.();
} while (payments);

With the iteration API, they can write:

// ❌ Bad! Never do this.
for await (let payment of mollieClient.payments.iterate()) {
    console.log(payment);
}

(Note that there is no break keyword in the for-loop.)

The version with the iteration API is two lines shorter, and is less explicit about how many requests it makes (there is no explicit nextPage call).

People within Mollie have expressed concern that the problem they're already having ‒ developers making many requests in rapid succession ‒ may worsen now that we have the iteration API.

Caveats

Developers are using this client voluntarily. They can opt-out of using it at any time, and just use curl to communicate with the Mollie API instead. Nothing we can do in the client prevents a developer from performing a DDoS attack on the Mollie API.

What we can do, is make developers aware of the stress they put on the Mollie API. And perhaps we can make it easier to get the job done within a reasonable number of requests.

Pimm commented 2 years ago

Let's talk potential solutions.

Remove the iteration API and nextPage

If the official Mollie stance is that developers should never request large sets of data (i.e. more data than fits on one page), then we should get rid of both APIs which were designed to help developers do that.

The signal we're sending developers, is that they have to wait for the Mollie API to support filtering/searching natively.

If a developer really really really wants to request a large set of data anyway, they could:

const firstPage = await mollieClient.payments.page();
const secondPage = await mollieClient.payments.page({ from: firstPage.nextPageCursor });
const thirdPage = await mollieClient.payments.page({ from: secondPage.nextPageCursor });

(If we decide that this is the way to go, we should seriously consider removing next and previous from the PHP client as well.)

The tricky part is that we've seen developers "in the wild" requesting large sets of data. Furthermore, next was probably introduced into the PHP client for a reason. Apparently, the use cases exist.

Remove the iteration API but keep nextPage

While I agree that the pattern using nextPage makes the number of requests made under the hood more evident, I don't believe removing the iteration API while keeping nextPage is a fruitful direction.

Either we say developers should never request large sets of data, in which case we should remove both APIs which were designed to help them do that, or we accept that sometimes they might have to, in which case the iteration API is (almost?) always preferable over using nextPage.

The iteration API is JavaScript-native (meaning it's familiar to developers); more convenient; but also: in the right hands, the iteration API is actually more efficient (both in the number of requests made and memory usage) than using nextPage.

Introduce a mandatory limit to prevent runaway loops

An obvious solution would be to force developers to specify an upper limit upfront. Although their upper limit is optional, this is the approach Stripe landed on.

We could request this upper limit in the iterate method.

const iterator = mollieClient.payments.iterate({ limit: 500 })
    .filter(({ amount }) => amount.currency == 'EUR' && parseFloat(amount.value) > 100)
    .take(10);
const iterator = mollieClient.customers.iterate({ limit: 1000 })
    .filter(({ locale }) => locale == 'it_IT');

We could also require take to be the first call in the chain.

const iterator = mollieClient.payments.iterate().take(500)
    .filter(({ amount }) => amount.currency == 'EUR' && parseFloat(amount.value) > 100)
    .take(10);
const iterator = mollieClient.customers.iterate().take(1000)
    .filter(({ locale }) => locale == 'it_IT');

If take becomes the first call in the chain by definition, we could decide to take out the .iterate() part in some cases:

const iterator = mollieClient.payments.take(500)
    .filter(({ amount }) => amount.currency == 'EUR' && parseFloat(amount.value) > 100)
    .take(10);
const iterator = mollieClient.customers.take(1000)
    .filter(({ locale }) => locale == 'it_IT');

The behaviour of the examples above would be:

  1. Find the 10 latest euro payments over €100.00… unless there are fewer than 10 such payments within the 500 most recent payments, in which case only the one within those 500 are found.
  2. Find all Italian customers… unless there are more than 1000 customers, in which case only the Italian customers within the 1000 newest are found.

If a developer really really really wants to iterate over all payments, they still can:

for await (let payment of mollieClient.payments.iterate({ limit: Number.POSITIVE_INFINITY })) {
    console.log(payment);
}

As mentioned in the caveats above, we can never effectively stop a developer who is determined to do this. (We can limit the limit, but there would be various ways around that.) However, the developer knowingly chooses to bombard the Mollie API at this point. This does not happen by accident.

Introduce throttling

Another solution would be to introduce throttling. This reduces stress on the Mollie API, but also signals to the developer "hey, this thing you're doing, it's costly, is there no better way?"

The first few pages are requested in rapid succession (providing that the developer doesn't break or take or find or otherwise terminates the loop), but after that, an artificial timeout is introduced before more requests are made.

The advantage is that this approach does not affect the use case find the 10 latest euro payments over €100.00, since that probably only requires the first page, or maybe the first two or three of them.

We could choose to put the developer in control over the throttling.

mollieClient.payments.iterate({ valuesPerMinute: 1000 })

A sensible default would be chosen for valuesPerMinute. Perhaps 2000 (8 pages).

As with the upper limit, a stubborn developer could do this:

for await (let payment of mollieClient.payments.iterate({ valuesPerMinute: Number.POSITIVE_INFINITY })) {
    console.log(payment);
}

But then the DDoS would be intentional, and an angry phone call from Mollie would not come as a surprise.

An added bonus is that developers could actually choose to turn the throttling up. Imagine I build a project in which I want to iterate over a bunch of payments in the middle of the night to check the status, making sure I haven't missed any status updates. Because it's nighttime and I am in no rush, I could set valuesPerMinute to 500, and be in perfect harmony with Mollie infrastructure.

Ultimately

The real solution is of course for Mollie to support the filtering and searching that developers need natively in the API.

Once the Mollie API supports filtering payments by currency, for instance, we can expand the iteration API:

// Find the 10 latest euro payments.
const iterator = mollieClient.payments.iterate()
    .filter({ currency: 'EUR' })
    .take(10);
for await (let payment of iterator) {
    // Apply logic.
}

This would guarantee one single call to the Mollie API (v2/payments?currency=EUR&limit=10).

aurelienbobenrieth commented 6 months ago

@Pimm I know this is a closed thread, however I am genuinely interested on how we (developers) are supposed NOT to fetch large sets of data as we have no other choice but doing like this when we have nothing like search, filters, query parameters or anything to proceed ?

Did I miss something ?