googleapis / nodejs-datacatalog

This repository is deprecated. All of its content and history has been moved to googleapis/google-cloud-node.
Apache License 2.0
11 stars 9 forks source link

Respect pageSize parameter #296

Closed zettadam closed 1 year ago

zettadam commented 3 years ago

Environment details


I wrote a request handler (fastify) that uses @google-cloud/datacatalog library. According to documentation I should pass pageSize to limit the number of results returned. However, responses to searchCatalog or searchCatalogAsync return hundreds or thousands of results when I pass pageSize parameter in the request body.

Seems like a bug? Or am I using this API incorrectly?

My request body is as follows:

{
    "query": "country",
    "pageSize": 50,
    "orderBy": "relevance"
}

My handler looks like this:

import { DataCatalogClient } from '@google-cloud/datacatalog'

const dc = new DataCatalogClient()
const projectId = 'some-project-id'

const search = async (req, res) => {
  const { body } = req
  const requestPayload = {
    scope: {
      includeProjectIds: [projectId],
    },
    ...body
  }

  const result = await dc.searchCatalogAsync(requestPayload)

  let results = []

  for await(const r of result) {
    results.push(r)
  }

  res.code(200).send(results)
}
meredithslota commented 3 years ago

I ... wonder if the scope is the issue here? Data Catalog can also search the public datasets in BigQuery if includeOrgIds, includeProjectIds are empty AND includeGcpPublicDatasets is not set to false. When I search the public datasets, I get many results for "country". Can you try defining the scope more narrowly?

zettadam commented 3 years ago

@meredithslota I've added those parameters to my scope (though I had specified projectId) and results are the same:

  const search = async (req, res) => {
    const { body } = req
    const requestPayload = {
      scope: {
        includeProjectIds: [projectId],
        includeGcpPublicDatasets: false,
        includePublicTagTemplates: false
      },
      ...body
    }

    const result = await dc.searchCatalogAsync(requestPayload)
    ...

My req.body.pageSize is set to 50 but I get over 1000 results in a response.

zettadam commented 3 years ago

By the way, when I perform the same search with identical parameters with REST API in GCP console (found on the REST API reference page for catalog.search), I get 50 results with a token to the next page.

meredithslota commented 3 years ago

Thanks so much for the follow-up. I've reclassified the issue as a bug accordingly.

meredithslota commented 2 years ago

@sofisl Can you take a look at this or help route accordingly?

sofisl commented 1 year ago

Hi @zettadam, to answer your question in the meantime, you need to add autoPaginate: false as a second parameter to your request so that gax knows to respect your pagination options, i.e.,:

const result = await dc.searchCatalogAsync(requestPayload, {autoPaginate: false})

This is not at all intuitive so I think the first order of business is to add some documentation: https://github.com/googleapis/gax-nodejs/pull/1374.

The second problem is that, when I first attempted to recreate your issue it was hanging indefinitely. I think this is related to errors thrown by the backend not being respected when autopagination is at play. Will need to investigate this further in https://github.com/googleapis/gax-nodejs/issues/1373