theRealPadster / diffbot-api-node

Diffbot-API-Node is a Promise-based library to use the Diffbot REST APIs
MIT License
3 stars 2 forks source link

paging option not exposed in analyze endpoint #16

Closed goleary closed 3 years ago

goleary commented 3 years ago

Hi! πŸ‘‹

Firstly, thanks for your work on this project! πŸ™‚

Today I used patch-package to patch diffbot-api-node@0.3.6 for the project I'm working on.

Paging can be passed as query param to both the analyze & article endpoints, but this client doesn't expose it as an option for analyze.

Here is the diff that solved my problem:

diff --git a/node_modules/diffbot-api-node/src/diffbot.js b/node_modules/diffbot-api-node/src/diffbot.js
index ca19e37..b1bbd80 100644
--- a/node_modules/diffbot-api-node/src/diffbot.js
+++ b/node_modules/diffbot-api-node/src/diffbot.js
@@ -19,6 +19,7 @@ class Diffbot {
    * @param {string} [options.mode] By default the Analyze API will fully extract all pages that match an existing Automatic API -- articles, products or image pages. Set mode to a specific page-type (e.g., mode=article) to extract content only from that specific page-type. All other pages will simply return the default Analyze fields.
    * @param {string} [options.fallback] Force any non-extracted pages (those with a type of "other") through a specific API. For example, to route all "other" pages through the Article API, pass &fallback=article. Pages that utilize this functionality will return a fallbackType field at the top-level of the response and a originalType field within each extracted object, both of which will indicate the fallback API used.
    * @param {string[]} [options.fields] Specify optional fields to be returned from any fully-extracted pages, e.g.: &fields=querystring,links. See available fields within each API's individual documentation pages.
+   * @param {boolean} [options.paging] Pass paging=false to disable automatic concatenation of multiple-page articles. (By default, Diffbot will concatenate up to 20 pages of a single article.)
    * @param {boolean} [options.discussion] Pass discussion=false to disable automatic extraction of comments or reviews from pages identified as articles or products. This will not affect pages identified as discussions.
    * @param {number} [options.timeout] Sets a value in milliseconds to wait for the retrieval/fetch of content from the requested URL. The default timeout for the third-party response is 30 seconds (30000).
    * @param {string} [options.callback] Use for jsonp requests. Needed for cross-domain ajax.
@@ -43,6 +44,9 @@ class Diffbot {
     if (options.fields)
       diffbot_url += `&fields=${options.fields.join(',')}`;

+    if (options.paging != undefined)
+      diffbot_url += `&paging=${options.paging}`;
+
     if (options.discussion != undefined)
       diffbot_url += `&discussion=${options.discussion}`;

This issue body was partially generated by patch-package.

theRealPadster commented 3 years ago

Hey, thanks for the interest! I double checked the Diffbot documentation, and can't find mention of paging being used for the Analyze API in either of their documentation sites (https://www.diffbot.com/dev/docs/analyze/ or https://docs.diffbot.com/docs/en/api-analyze). I can add the paging param to the analyze API as you've outlined, but I just want to first confirm that it does exist for that endpoint and works identically (for the JSDocs). Do you have a link to where that param is defined for the analyze API, or is it the exact same behaviour from what you can tell?

goleary commented 3 years ago

Great call out. I was hoping to be able to point you at docs that detailed this usage when I opened this issue but was unable to find any 😬

From my testing it exhibits the same behaviour when analyze routes to the article product.

I have an ongoing email thread with diffbot support who recommended I use it with the analyze endpoint which is what led me here.

I've put this issue on their radar, so perhaps someone from diffbot will chime in.

Thanks for the quick response!

goleary commented 3 years ago

@theRealPadster I have been unable to drum up any support from diffbot on this, but I can confirm that it behaves as I mention above when &paging=true is passed to the analyze endpoint.

I'm happy to draft the PR if you're open to accepting.

theRealPadster commented 3 years ago

Sorry, yeah I was waiting to see if the Diffbot team would say anything. I have merged it in, and I'll publish it on npm now :) Thanks!

goleary commented 3 years ago

Awesome, thanks a ton!