stefl / GoGenie

http://gogenie.org
2 stars 0 forks source link

Scrape Culture24 #79

Open stefl opened 12 years ago

stefl commented 12 years ago

This is part of the scraping job. Essentially, this is "write a scraper", "write an indexer", "spider the site/API", "download every record", "map it onto our DB structure", "create new entries or update existing ones", "find a way to dedupe".

Time consuming - do we get adequate results for the effort?

Peskypeople commented 12 years ago

CLOSE ISSUE

You know what having looked again it's useless amount of data it's got worse!!! Only got just under 5,000 listed and mostly just has the info we can capture from a general scraper.

Here's what you have to do to get the access info for one venue from their website just to show how bad it is (going from result back to search) Six confusing steps just to get a venue listing.

  1. Venue details with access (wheelchair symbol) http://www.culture24.org.uk/am18393
  2. Random click on the map as it shows no venue details: http://www.culture24.org.uk/search%20results?f1=Type&t1=3O&f2=Place&t2=Z03.01.08&f3=Type&t3=3O.5&d=Map&n=20&s=alphabetical&sd=ascending
  3. Browse venues: brings up 332 items in WM or 4794 nationally: http://www.culture24.org.uk/search%20results?f1=Type&t1=3O&d=Map&n=20&s=alphabetical&sd=ascending
  4. Clicked on WM: http://www.culture24.org.uk/places%20to%20go/west%20midlands
  5. Clicked on places to go: http://www.culture24.org.uk/places%20to%20go
  6. home page cultural 24: http://www.culture24.org.uk/home