Closed batman-do closed 10 months ago
Hi @batman-do - can you clarify your question for me? Are the additional links the ones on a page, like "about", "contact", etc.?
Passing verbose_output=True
might give you what you want. You may have to experiment and see what the data looks like here https://github.com/opsdisk/yagooglesearch/blob/master/yagooglesearch/__init__.py#L514
There may be an additional links attribute you can extract.
Hi @batman-do - can you clarify your question for me? Are the additional links the ones on a page, like "about", "contact", etc.?
Passing
verbose_output=True
might give you what you want. You may have to experiment and see what the data looks like here https://github.com/opsdisk/yagooglesearch/blob/master/yagooglesearch/__init__.py#L514There may be an additional links attribute you can extract.
@opsdisk What I mean is that I only want to get the main links when getting the top-10, for example, not the secondary links anymore.
Main link: https://bepos.io/blogs/email-ban-hang-chuyen-nghiep/ Additional link: https://bepos.io/blogs/email-ban-hang-chuyen-nghiep/#ftoc-heading-8
,...
So you get both of these results back from yagooglesearch, and you only want the "main" one, not the "additional" link?
Main link: https://bepos.io/blogs/email-ban-hang-chuyen-nghiep/
Additional link: https://bepos.io/blogs/email-ban-hang-chuyen-nghiep/#ftoc-heading-8
I'd recommend filtering them out with regex after they are all collected. So using https://github.com/opsdisk/yagooglesearch#usage as an example, add some logic/regex in the for loop to remove the ones with URL anchors that you don't want.
So you get both of these results back from yagooglesearch, and you only want the "main" one, not the "additional" link?
Main link: https://bepos.io/blogs/email-ban-hang-chuyen-nghiep/ Additional link: https://bepos.io/blogs/email-ban-hang-chuyen-nghiep/#ftoc-heading-8
I'd recommend filtering them out with regex after they are all collected. So using https://github.com/opsdisk/yagooglesearch#usage as an example, add some logic/regex in the for loop to remove the ones with URL anchors that you don't want.
@opsdisk thank u for reply, I understand this :)),
How do I only get the main links and not the additional links attached to the main link (which are additional links #...), @opsdisk