jarun / googler

:mag: Google from the terminal
GNU General Public License v3.0
6.11k stars 529 forks source link

Deduplicate results #356

Closed zmwangx closed 4 years ago

zmwangx commented 4 years ago

Previously results may be duplicated, e.g. for the response https://git.io/JJn05 the top result (from a featured snippet) is shown in googler output twice.

The reason that happened is that a feature snippet could contain div.g inside a div.g, so when we select results based on div.g we picked up the same result twice -- the second container is a child of the first.

Instead of tracking node ancestry which is rather annoying, we introduce __eq__ on Result and make sure no duplicate is recorded that way. Also introduced __hash__, not actually in use but why not.


Before:

$ ./googler --debug --parse /tmp/googler-response-44zgwc5a.html
[DEBUG] googler version 4.1
[DEBUG] Python version 3.8.2

 1.  Is there official guide for Python 3.x release lifecycle? - Stack ...
     https://stackoverflow.com/questions/40655195/is-there-official-guide-for-python-3-x-release-lifecycle

 2.  Is there official guide for Python 3.x release lifecycle? - Stack ...
     https://stackoverflow.com/questions/40655195/is-there-official-guide-for-python-3-x-release-lifecycle

 3.  17. Development Cycle — Python Developer's Guide
     https://devguide.python.org/devcycle/
     A branch less than 5 years old but no longer in maintenance mode is a ... For reference, here are the Python versions that most recently reached their
     end-of-life: ...

...

After:

$ ./googler --debug --parse /tmp/googler-response-44zgwc5a.html
[DEBUG] googler version 4.1
[DEBUG] Python version 3.8.2

 1.  Is there official guide for Python 3.x release lifecycle? - Stack ...
     https://stackoverflow.com/questions/40655195/is-there-official-guide-for-python-3-x-release-lifecycle

 2.  17. Development Cycle — Python Developer's Guide
     https://devguide.python.org/devcycle/
     A branch less than 5 years old but no longer in maintenance mode is a ... For reference, here are the Python versions that most recently reached their
     end-of-life: ...

...
jarun commented 4 years ago

Please resolve the conflict.

zmwangx commented 4 years ago

Done.

jarun commented 4 years ago

Thank you!