serpapi / public-roadmap

Public Roadmap for SerpApi, LLC (https://serpapi.com)
45 stars 3 forks source link

[Google Images] Images not Parsed in a Search #1624

Closed schaferyan closed 1 month ago

schaferyan commented 1 month ago

A high-volume customer shared an example of an image search where the images are present in the HTML but not parsed in the JSON.

I wasn't able to replicate it in the Playground, but they repeated the search multiple times with no_cache=true with the same result.

It may be worth noting that they are using tbm=isch rather than engine=google_images.

I've also noticed that when performing tbm=isch searches in the Playground, they seem to be redirected to engine=google_images. However, this customer's searches are still registering as tbm=isch in the inspector.

Customer's Search 1 Customer's Search 2 Attempt to Replicate Playground Intercom

ishiharaf commented 1 month ago

I took a look at the user's searches and it seems the ones where he wasn't getting redirected with the tbm parameter was done around 3 hours after I pushed changes to solve another issue related to the tbm parameter. All his failing requests however have no engine parameter set, and after the changes the tbm parameter will only redirect if engine is set to google.

The playground will just assume engine is google if there's nothing there, see the tbm parameter and change the engine to google_images. And it seems like our API was doing the same, but our documentation lists engine as a required parameter. The customer was relying on undocumented behavior to get redirected to google_images instead of setting the engine parameter to google as our documentation says (or setting the engine to google_images instead of relying on tbm).

He can still use tbm if he sets the engine to google, or he can use google_images as the engine. I'm not sure we want to allow searches with no engine parameter set to work with tbm as that was unintended behavior.

martin-serpapi commented 1 month ago

Another customer reported this:

Intercom

ilyazub commented 1 month ago

My mistake, I didn't check the incoming requests for this case and didn't check the documentation (https://serpapi.com/news-results) during the code review.

We will update SerpApi code to support tbm=nws without an engine while still making sure tbm won't change non-google engine (https://github.com/serpapi/public-roadmap/issues/1620), and update our documentation to require engine=google_news instead of tbm=nws.

schaferyan commented 1 month ago

Another customer reported this:

Intercom

marm123 commented 1 month ago

and update our documentation to require engine=google_news instead of tbm=nws.

I don't think we want to do that for News Results, as google_news is a separate engine (https://serpapi.com/google-news-api)

ishiharaf commented 1 month ago

I don't think we want to do that for News Results, as google_news is a separate engine (https://serpapi.com/google-news-api)

For engine=google&tbm=nws, we're not redirecting it to google_news engine. We're redirecting for Google Shopping, Videos, Local, Images, and Patents. So Google News was not affected by this last change and in fact searches still work without the engine being set: https://serpapi.com/search.json?q=Biden&tbm=nws&location=Austin,+TX,+Texas,+United+States

It's an example in our documentation showing that searches with the tbm parameter work without engine. I couldn't find any other example of tbm being used this way.

marm123 commented 1 month ago

Yeah, makes sense. I was just refering to the change in the documentation. Maybe I misunderstood it, but I thought the line I quoted suggested that we should be updating the documentation to include engine=google_news as a required parameter, and wanted to make sure we don't update it this way.

Anyway, it's offtopic already, so let's keep it about the Google Images issue.

ishiharaf commented 1 month ago

It's not off-topic. You're correct, I was adding to what you said. Sorry for not being very clear. We shouldn't update news to google_news for being a separate engine and for the reasons I added.

ilyazub commented 1 month ago

This issue was fixed by https://github.com/serpapi/SerpApi/pull/4874.

engine=bing preserved. image (ref: https://serpapi.com/search?q=Pioneer+Press+logo&tbm=isch&engine=bing)

engine=google_images added. image (ref: https://serpapi.com/search?q=Pioneer+Press+logo&tbm=isch)

engine=google is changing to google_images. image (ref: https://serpapi.com/search?q=Pioneer+Press+logo&tbm=isch&engine=google)

I wasn't able to replicate it in the Playground, but they repeated the search multiple times with no_cache=true with the same result.

@schaferyan It was possible to reproduce this error using our Search API directly. Usually, our Playground fixes incorrect parameters.