opsdisk / yagooglesearch

Yet another googlesearch - A Python library for executing intelligent, realistic-looking, and tunable Google searches.
BSD 3-Clause "New" or "Revised" License
243 stars 42 forks source link

How do I only get the main links #32

Closed batman-do closed 10 months ago

batman-do commented 11 months ago

How do I only get the main links and not the additional links attached to the main link (which are additional links #...), @opsdisk

image

opsdisk commented 11 months ago

Hi @batman-do - can you clarify your question for me? Are the additional links the ones on a page, like "about", "contact", etc.?

Passing verbose_output=True might give you what you want. You may have to experiment and see what the data looks like here https://github.com/opsdisk/yagooglesearch/blob/master/yagooglesearch/__init__.py#L514

There may be an additional links attribute you can extract.

batman-do commented 11 months ago

Hi @batman-do - can you clarify your question for me? Are the additional links the ones on a page, like "about", "contact", etc.?

Passing verbose_output=True might give you what you want. You may have to experiment and see what the data looks like here https://github.com/opsdisk/yagooglesearch/blob/master/yagooglesearch/__init__.py#L514

There may be an additional links attribute you can extract.

@opsdisk What I mean is that I only want to get the main links when getting the top-10, for example, not the secondary links anymore.

Main link: https://bepos.io/blogs/email-ban-hang-chuyen-nghiep/ Additional link: https://bepos.io/blogs/email-ban-hang-chuyen-nghiep/#ftoc-heading-8

,...

opsdisk commented 10 months ago

So you get both of these results back from yagooglesearch, and you only want the "main" one, not the "additional" link?

Main link: https://bepos.io/blogs/email-ban-hang-chuyen-nghiep/
Additional link: https://bepos.io/blogs/email-ban-hang-chuyen-nghiep/#ftoc-heading-8

I'd recommend filtering them out with regex after they are all collected. So using https://github.com/opsdisk/yagooglesearch#usage as an example, add some logic/regex in the for loop to remove the ones with URL anchors that you don't want.

batman-do commented 10 months ago

So you get both of these results back from yagooglesearch, and you only want the "main" one, not the "additional" link?

Main link: https://bepos.io/blogs/email-ban-hang-chuyen-nghiep/
Additional link: https://bepos.io/blogs/email-ban-hang-chuyen-nghiep/#ftoc-heading-8

I'd recommend filtering them out with regex after they are all collected. So using https://github.com/opsdisk/yagooglesearch#usage as an example, add some logic/regex in the for loop to remove the ones with URL anchors that you don't want.

@opsdisk thank u for reply, I understand this :)),