Previously results may be duplicated, e.g. for the response https://git.io/JJn05 the top result (from a featured snippet) is shown in googler output twice.
The reason that happened is that a feature snippet could contain div.g inside a div.g, so when we select results based on div.g we picked up the same result twice -- the second container is a child of the first.
Instead of tracking node ancestry which is rather annoying, we introduce __eq__ on Result and make sure no duplicate is recorded that way. Also introduced __hash__, not actually in use but why not.
Before:
$ ./googler --debug --parse /tmp/googler-response-44zgwc5a.html
[DEBUG] googler version 4.1
[DEBUG] Python version 3.8.2
1. Is there official guide for Python 3.x release lifecycle? - Stack ...
https://stackoverflow.com/questions/40655195/is-there-official-guide-for-python-3-x-release-lifecycle
2. Is there official guide for Python 3.x release lifecycle? - Stack ...
https://stackoverflow.com/questions/40655195/is-there-official-guide-for-python-3-x-release-lifecycle
3. 17. Development Cycle — Python Developer's Guide
https://devguide.python.org/devcycle/
A branch less than 5 years old but no longer in maintenance mode is a ... For reference, here are the Python versions that most recently reached their
end-of-life: ...
...
After:
$ ./googler --debug --parse /tmp/googler-response-44zgwc5a.html
[DEBUG] googler version 4.1
[DEBUG] Python version 3.8.2
1. Is there official guide for Python 3.x release lifecycle? - Stack ...
https://stackoverflow.com/questions/40655195/is-there-official-guide-for-python-3-x-release-lifecycle
2. 17. Development Cycle — Python Developer's Guide
https://devguide.python.org/devcycle/
A branch less than 5 years old but no longer in maintenance mode is a ... For reference, here are the Python versions that most recently reached their
end-of-life: ...
...
Previously results may be duplicated, e.g. for the response https://git.io/JJn05 the top result (from a featured snippet) is shown in googler output twice.
The reason that happened is that a feature snippet could contain
div.g
inside adiv.g
, so when we select results based on div.g we picked up the same result twice -- the second container is a child of the first.Instead of tracking node ancestry which is rather annoying, we introduce
__eq__
onResult
and make sure no duplicate is recorded that way. Also introduced__hash__
, not actually in use but why not.Before:
After: