swannodette / enlive-tutorial

An Easy Introduction to Enlive
616 stars 80 forks source link

It seems scrape2.clj produces unexpected results #19

Open maxcountryman opened 11 years ago

maxcountryman commented 11 years ago

While playing around with the tutorial I noticed some odd printouts:

=> (print-headlines-and-points)
Porting dl.google.com from C++ to Go (153 points)
DoorDash (YC S13) Delivers Food Quickly In South Bay, Hopes To Expand Beyond Food (23 points)
Free email address validation API for web forms (144 points)
Big-O notation explained by a self-taught programmer (69 points)
The Financial Times on Edward Tufte (61 points)
Apple’s Developer Center Is Back After Over a Week Offline (39 points)
Scientist banned from revealing codes used to start luxury cars (46 points)
They Know Much More Than You Think (18 points)
Superbrothers: Sword & Sworcery sales statistics (62 points)
Simple API with Nginx and PostgreSQL (124 points)
Sellbox – Sell your files from Dropbox and Google Drive with Paypal (68 points)
Hacker Barnaby Jack has died (188 points)
Ashton Kutcher annotates Steve Jobs' 1982 Academy of Achievement speech (16 points)
Did Frank Lloyd Wright create America's greatest office? (61 points)
Kindergarten coders can program before they can read (35 points)
ChessBoardJS (119 points)
Spy agencies ban Lenovo PCs on security grounds (86 points)
My RabbitMQ setup for notifications (78 points)
New edX courses (46 points)
Chinese firm Huawei controls net filter praised by PM (238 points)
Lawmakers Who Upheld NSA Phone Spying Received Double the Defense Industry Cash (79 points)
Donate to Replicant and support free software on mobile devices (31 points)
Android Bug Superior to Master Key (87 points)
How to kill an unresponsive SSH session (165 points)
Report Invalid Whois Contact Information to ICANN? (18 points)
How to slaughter a patent troll in 5 steps (6 points)
Court grants Chevron access to 9 years of email data of activists, critics [pdf] (scribd)
111 points (Feds tell web firms to turn over user passwords)
209 points (Hidden “App Ops” Feature in Android 4.3 Lets You Disable Permissions From Apps)
55 points (Distributed Actors in Java and Clojure)
60 points (More)
nil

It seems like the composite selector, #{[:td.title :a] [:td.subtext html/first-child]}, is incorrectly selecting bits of the page. By itself it seems fine:

=> (pprint (html/select (fetch-url *base-url*) #{[:td.subtext html/first-child]}))
({:tag :span, :attrs {:id "score_6110398"}, :content ("153 points")}
 {:tag :span, :attrs {:id "score_6111110"}, :content ("23 points")}
 {:tag :span, :attrs {:id "score_6109905"}, :content ("144 points")}
 {:tag :span, :attrs {:id "score_6110671"}, :content ("69 points")}
 {:tag :span, :attrs {:id "score_6110602"}, :content ("61 points")}
 {:tag :span, :attrs {:id "score_6110858"}, :content ("39 points")}
 {:tag :span, :attrs {:id "score_6110575"}, :content ("46 points")}
 {:tag :span, :attrs {:id "score_6110993"}, :content ("18 points")}
 {:tag :span, :attrs {:id "score_6110005"}, :content ("62 points")}
 {:tag :span, :attrs {:id "score_6109069"}, :content ("124 points")}
 {:tag :span, :attrs {:id "score_6109649"}, :content ("68 points")}
 {:tag :span, :attrs {:id "score_6108217"}, :content ("188 points")}
 {:tag :span, :attrs {:id "score_6111050"}, :content ("16 points")}
 {:tag :span, :attrs {:id "score_6109626"}, :content ("61 points")}
 {:tag :span, :attrs {:id "score_6110230"}, :content ("35 points")}
 {:tag :span, :attrs {:id "score_6108628"}, :content ("119 points")}
 {:tag :span, :attrs {:id "score_6108980"}, :content ("86 points")}
 {:tag :span, :attrs {:id "score_6109077"}, :content ("78 points")}
 {:tag :span, :attrs {:id "score_6109775"}, :content ("46 points")}
 {:tag :span, :attrs {:id "score_6107313"}, :content ("238 points")}
 {:tag :span, :attrs {:id "score_6110595"}, :content ("79 points")}
 {:tag :span, :attrs {:id "score_6109916"}, :content ("31 points")}
 {:tag :span, :attrs {:id "score_6108469"}, :content ("87 points")}
 {:tag :span, :attrs {:id "score_6107553"}, :content ("165 points")}
 {:tag :span, :attrs {:id "score_6110307"}, :content ("18 points")}
 {:tag :span, :attrs {:id "score_6111012"}, :content ("6 points")}
 {:tag :span, :attrs {:id "score_6108061"}, :content ("111 points")}
 {:tag :span, :attrs {:id "score_6106940"}, :content ("209 points")}
 {:tag :span, :attrs {:id "score_6109897"}, :content ("55 points")}
 {:tag :span, :attrs {:id "score_6108556"}, :content ("60 points")})
nil

But with the additional selector:

=> (pprint (html/select (fetch-url *base-url*) #{[:td.title :a] [:td.subtext html/first-child]}))
...
 {:tag :a,
  :attrs
  {:href
   "http://m.cnet.com/news/feds-tell-web-firms-to-turn-over-user-account-passwords/57595529"},
  :content ("Feds tell web firms to turn over user passwords")}
 {:tag :span, :attrs {:id "score_6106940"}, :content ("209 points")}
 {:tag :a,
  :attrs
  {:href
   "http://www.droid-life.com/2013/07/26/hidden-app-ops-feature-in-android-4-3-lets-you-selectively-disable-permissions-from-apps/"},
  :content
  ("Hidden “App Ops” Feature in Android 4.3 Lets You Disable Permissions From Apps")}
 {:tag :span, :attrs {:id "score_6109897"}, :content ("55 points")}
 {:tag :a,
  :attrs
  {:href
   "http://blog.paralleluniverse.co/post/56519815799/distributed-actors-in-java-and-clojure"},
  :content ("Distributed Actors in Java and Clojure")}
 {:tag :span, :attrs {:id "score_6108556"}, :content ("60 points")}
 {:tag :a, :attrs {:href "news2"}, :content ("More")})
nil

I'm using Enlive 1.1.1 and Clojure 1.4.0.

pbwolf commented 9 years ago

Because the selector, #{...}, is a set, which is an unordered collection.