serpapi / public-roadmap

Public Roadmap for SerpApi, LLC (https://serpapi.com)
54 stars 5 forks source link

[Google Images API] Scrape full `title` #1864

Open marm123 opened 2 months ago

marm123 commented 2 months ago

One of our customers reported that they get truncated title for some searches. I was able to recreate search with full title and truncated title. I'm not sure if it's possible to consistently get the full title, but if yes, we should do that.

Truncated title returned:

image

Full title returned:

image

The same issue as https://github.com/serpapi/public-roadmap/issues/941

Playground - truncated | Inspect - truncated Playground - full | Inspect - full Intercom

Freaky commented 2 months ago

Thanks @marm123.

An expanded title for each image is available on the page - it's displayed when you click on a link to open the image information side-bar and we can extract this from JavaScript. However, it doesn't always simply consist of a full version of the same title shown on the link text, see the first and last item here for example:

@ ["images_results",0,"title"]
- "Gubi Beetle Dining Chair - Conic Base ..."
+ "Beetle Dining Chair - Conic Base - Fully Upholstered"
@ ["images_results",2,"title"]
- "GUBI Beetle Dining Chair, 3D Veneer ..."
+ "GUBI Beetle Dining Chair, 3D Veneer - Front Upholstered"
@ ["images_results",3,"title"]
- "GUBI BEETLE DINING CHAIR FULLY ..."
+ "GUBI BEETLE DINING CHAIR FULLY UPHOLSTERED / CONIC BASE | GARDE – Garde"
@ ["images_results",4,"title"]
- "Gubi - Beetle Dining Chair (padded ..."
+ "Gubi - Beetle Dining Chair (padded) | Connox"
@ ["images_results",5,"title"]
- "Gubi Beetle Dining Chair - Conic Base ..."
+ "Beetle Dining Chair - Conic Base - Fully Upholstered"

1864-title-extract

Perhaps it warrants extraction to a new field. We could also extract the snippet and suchlike from this section.