ltenzil / scrape

Scrape talks about building reports on user entered keywords and its search results (Google)
0 stars 0 forks source link

Search API vs. Web Scraping #12

Open longnd opened 2 years ago

longnd commented 2 years ago

The PR #5 implements the Google Custom Search API, the stored data does not include

There are several ways to get this data, but one solution that does not rely on third-party solutions is to scrap the Google Search Result Page. Scraping comes with its own difficulty and workaround which you will need to work around creatively.

ltenzil commented 2 years ago

I think , custom search api doesn't give ad words data, or advertisement tags The response data has the links and html snippet, I utilized it, in the keywords show page. its stored in keywords table. I haven't worked extensively in web scraping but I tried Nokogiri little bit before, when i tried it on google search results, I didn't get html response which we see in browser.I thought google is using some kind of js framework which doesn't populate data for JSON request.

longnd commented 2 years ago

I think , custom search api doesn't give ad words data, or advertisement tags

Not only that, by using the search API in the current implementation, doesn't support a file upload with 100 keywords. That's why using the search API doesn't fulfill the requirement.

The response data has the links and html snippet, I utilized it, in the keywords show page. its stored in keywords table.

it contains the HTML snippet for search search result. But not the HTML of the first search page as required, unfortunately.

For each search result/keyword result page on Google, store the following information on the first page of results: Total number of AdWords advertisers on the page. HTML code of the page/cache of the page. ...

ltenzil commented 2 years ago

when i tried it on google search results, I didn't get html response which we see in browser.I thought google is using some kind of js framework.

How you overcome this? I tried with User-Agent in headers as well. still same response. I tried with PostMan, Nokogiri same response.

it contains the HTML snippet for search search result. But not the HTML of the first search page as required, unfortunately.

I am not sure, how to proceed. @longnd , Can you guide me here I am out of ideas, to fetch data from Google search results.

longnd commented 2 years ago

Hi @ltenzil, I'm sorry for the late response. I meant, by using the Google Search API, it doesn't return the full HTML of the first search result page, just the HTML snippet of each result so that doesn't meet the requirement. You can get what you need from the search result page by implementing the scrapping yourself, e.g.