jarun / googler

:mag: Google from the terminal
GNU General Public License v3.0
6.11k stars 529 forks source link

DOM builder exception on specific first results #366

Closed gnubeest closed 4 years ago

gnubeest commented 4 years ago

I wrap googler's JSON output in an irc bot and was wondering why I got reproducible JSON errors on certain queries. I then would run the same query on googler from the command line and get this output:

❯ googler -d varsity hot dogs atlanta
[DEBUG] googler version 4.2
[DEBUG] Python version 3.8.5
[DEBUG] Platform: Linux-5.4.64-1-lts-x86_64-with-glibc2.2.5
[DEBUG] Connecting to new host www.google.com
[DEBUG] Opened socket to 64.233.177.147:443
[DEBUG] Fetching URL /search?ie=UTF-8&oe=UTF-8&q=varsity+hot+dogs+atlanta&sei=lt2MPEksSHCqBbBRIwmMog
[DEBUG] Cookie: 1P_JAR=2020-09-15-04
[DEBUG] Response body written to '/tmp/googler-response-_jgskry4.html'.
Traceback (most recent call last):
  File "/usr/bin/googler", line 3652, in <module>
    main()
  File "/usr/bin/googler", line 3641, in main
    repl.cmdloop()
  File "/usr/bin/googler", line 3003, in cmdloop
    self.fetch_and_display()
  File "/usr/bin/googler", line 2597, in enforced_method
    method(self, *args, **kwargs)
  File "/usr/bin/googler", line 2809, in fetch_and_display
    self.fetch()
  File "/usr/bin/googler", line 2597, in enforced_method
    method(self, *args, **kwargs)
  File "/usr/bin/googler", line 2719, in fetch
    parser = GoogleParser(page, news=self._google_url.news, videos=self._google_url.videos)
  File "/usr/bin/googler", line 2250, in __init__
    self.parse(html)
  File "/usr/bin/googler", line 2253, in parse
    tree = parse_html(html)
  File "/usr/bin/googler", line 740, in parse_html
    builder.feed(html)
  File "/usr/lib/python3.8/html/parser.py", line 111, in feed
    self.goahead(0)
  File "/usr/lib/python3.8/html/parser.py", line 173, in goahead
    k = self.parse_endtag(i)
  File "/usr/lib/python3.8/html/parser.py", line 421, in parse_endtag
    self.handle_endtag(elem)
  File "/usr/bin/googler", line 684, in handle_endtag
    raise DOMBuilderException(
__main__.DOMBuilderException: DOM builder aborted at 24:13107: expecting end tag 'path', got 'svg'

/tmp/googler-response-_jgskry4.html

Starting the same search with -s2 (thus omitting the first returned result) works as normal (of course minus the first Google result I expected). This has happened on several different queries and is reproducible each time. Google itself returns the expected first result.

zmwangx commented 4 years ago

Sorry about the delay. This is fixed in #379. With the fix applied, your attached HTML is parsed correctly:

$ ./googler --debug --parse /tmp/googler-response-_jgskry4.html
[DEBUG] googler version 4.2
[DEBUG] Python version 3.9.0
[DEBUG] Platform: macOS-10.15.7-x86_64-i386-64bit

 1.  The Varsity: What'll ya Have!
     https://www.thevarsity.com/
     EffectiveTuesday June 16th , in accordance with Governor Kemp's latest executive order, The Varsity Atlanta will be reopening our downstairs dining
     rooms.

 2.  The Varsity Atlanta - The Varsity
     https://www.thevarsity.com/locations/detail/1/The_Varsity_Atlanta
     The Varsity in downtown Atlanta is our original, world famous location. This enormous restaurant sits on 2 city blocks and can accommodate 800 diners
     inside.

 3.  Our Food - The Varsity
     https://www.thevarsity.com/food
     ... with Governor Kemp's latest executive order, The Varsity Atlanta will be reopening ... And can you really say you went to The Varsity if you didn't
     get a Frosted ... One burger with chili and cheese, one hot dog with mustard, chili, and cheese.

 4.  The Varsity - Wikipedia
     https://en.wikipedia.org/wiki/The_Varsity
     The Varsity is a restaurant chain, iconic in the modern culture of Atlanta, Georgia. The main ... Mad artist Jack Davis has done advertising for The
     Varsity. The Varsity was featured in the PBS documentary A Hot Dog Program by Rick Sebak.

 5.  The Varsity - Takeout & Delivery - 1454 Photos & 2070 ... - Yelp
     https://www.yelp.com/biz/the-varsity-atlanta-2
     Rating: 3, 2,070 reviews, Price range: Under $10
     $Inexpensive• Burgers, Hot Dogs, Fast Food. Open • 10:30 am - 11:00 ... Chili Cheese Dog, Chili Slaw Dog, Chili Burger, Fried Apple Pie, Chili Cheese
     Slaw Dog, Chili Cheese Burger, Naked Dog ... The Varsity is simply one of Atlanta's icons.

 6.  The Varsity - Takeout & Delivery - 140 Photos & 157 Reviews ...
     https://www.yelp.com/biz/the-varsity-atlanta-5
     Rating: 2.5, 157 reviews, Price range: Under $10
     ... of The Varsity "Tasty and less expensive than I'd expect for airport fast food. The food is similar to another hot dog place in Atlanta, Zesto.
     Friendly employees.

 7.  The Varsity- Atlanta's Favorite Hotdogs and Hamburgers
     https://www.atlanta.net/partner/the-varsity/296/
     Try the chili dogs, onion rings, Frosted Orange milkshake and homemade fried pies. The Varsity has been serving Atlanta's favorite hotdogs and
     hamburgers ...

 8.  91 years of chili dogs: How Atlanta's The Varsity lasts and lasts
     https://thetakeout.com/atlanta-the-varsity-chili-dogs-1835785500
     Jul 7, 2019
     Alongside their famous chili dogs, The Varsity's menu is a model of classic American drive-in fare. It's all hot dogs, hamburgers, fries, and onion ...

 9.  THE VARSITY, Atlanta - 61 North Ave NW, Downtown - Menu ...
     https://www.tripadvisor.com/Restaurant_Review-g60898-d492279-Reviews-The_Varsity-Atlanta_Georgia.html
     Rating: 4, 5,457 reviews, Price range: $
     The Varsity, Atlanta: See 5457 unbiased reviews of The Varsity, rated 4 of 5 on Tripadvisor and ranked ... Yes 100% all beef hot dogs, and yes they do
     have chili!

 10. The Varsity - Home | Facebook
     https://www.facebook.com/thevarsity/
     Rating: 4.3, 26,269 votes
     Come be a part of an Atlanta tradition! The Varsity is a 92-year-old family-owned and operated company. We treat our team members like family. We are
     looking to ...