Open sonika-serpapi opened 2 months ago
Does it make sense to set this up under its own "Yelp Brands" API? Similar to how we do Yelp Place or Yelp Reviews?
That way, searches are passed against yelp.com/brands/
@btaunt I think that is a good idea. I do think there is more value in setting it up under it's own "Yelp Brands" API, similar to Yelp Reviews, as there is no direct search for for the brands I believe. Instead there is a list maintained at https://www.yelp.com/brands, and there is no location needed to get the brand information page.
A second point I wanted to bring up is, would we need to maintain this brand list on our end?
I'll let others chime in on this as well.
A second point I wanted to bring up is, would we need to maintain this brand list on our end?
I have this snippet which is exactly for this issue's purpose, to get yelp brands json. Insanely enough, they don't have a paginated API, but rather a single, enormous JSON, as displayed on one page at the /brands route.
It's blocked by cors, so you'll have to have it be requested from the site's resources. Here's a quick way to save the json from the /batch gql endpoint that meets the brands json format (as they have multiple /batch calls, for different types of data)
const puppeteer = require('puppeteer');
const fs = require('fs');
(async () => {
// Launch a headless browser
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
// Set up request interception to listen for the specific GraphQL request
await page.setRequestInterception(true);
page.on('request', (interceptedRequest) => {
if (interceptedRequest.url() === 'https://www.yelp.com/gql/batch') {
interceptedRequest.continue();
} else {
interceptedRequest.continue();
}
});
// Listen to the responses to capture the payload
page.on('response', async (response) => {
if (response.url() === 'https://www.yelp.com/gql/batch') {
const jsonResponse = await response.json();
// Check for specific properties before saving
if (jsonResponse[0] && jsonResponse[0].data && jsonResponse[0].data.brandEntityIndex) {
const brands = jsonResponse[0].data.brandEntityIndex.brands;
if (brands && brands.length > 0 && brands[0].name && brands[0].urlAlias) {
fs.writeFileSync('output-yelp.json', JSON.stringify(jsonResponse, null, 2));
console.log('Captured JSON saved to output-yelp.json');
}
}
}
});
// Go to the Yelp page that triggers the request
await page.goto('https://www.yelp.com/brands', { waitUntil: 'networkidle2' });
// Wait for some time to ensure all requests are made
await page.waitForTimeout(5000);
// Close the browser
await browser.close();
})();
Thank you @kingmeers ! We will take this under consideration when developing a solution for this.
A high volume customer reached out asking to scrape reviews pertaining to a brand: https://www.yelp.com/brands/unifirst
Currently, they would have the aggregate across all stores to get this information about a brand, but since Yelp exposes this publicly for each brand, perhaps we can add support for scraping this page for each brand.
Specific brand page:
All brands:
Intercom