jueyang / call-me-maybe

Use the issue queue. Dark secrets welcome. (CUNY-J teaching 2015)
3 stars 0 forks source link

Trouble with Final Assignment #21

Closed willengel closed 9 years ago

willengel commented 9 years ago

Hi!

Right now I'm working on a final assignment for my scraping class. It's due tomorrow at noon, and I'm ALMOST done with it, but there's just one part that keeps giving me errors for some reason. My aim is to scrape a database of liquor licenses for the zip code 10036 and print the name and license type for each result. I'm using Python 2.7.9 for OSX. Here's the link:

https://www.tran.sla.ny.gov/JSP/query/PublicQueryPremisesSearchPage.jsp

And here's the function I've written to scrape each result:

def getRecords(url): r = requests.get(url) content = r.content soup = BeautifulSoup(content)

table = soup.findAll("table", attrs = {"summary": "For format purposes only."})
data = table.findAll("td", attrs = {"class": "displayvalue"})
Premises_Name = data[8].text.replace(" ", "")
License_Type = data[1].text.replace(" ", "")

return [Premises_Name, License_Type]**

The problem is that I keep getting this error message...

File "finalproject.py", line 35, in getRecords data = table.findAll("td", attrs = {"class": "displayvalue"}) AttributeError: 'ResultSet' object has no attribute 'findAll'

...which tells me that there is no item in the page's element with the "td" tag and the attributes "class": "displayvalue".

Here's the problem: there is. If you look at each result on the page when you search the zip code 10036, you'll see something like this for each row:

"td class="displayvalue" "td class="displaylabel" "td class="displayvalue"

I have no idea what to do. I found a specific tag with specific attributes, and Python's flat out telling me that it doesn't exist. Can you please help me?

I'm going to a CUNY friend gathering this afternoon, so I won't be available during the day, but I'll be down to chat tomorrow morning, or tonight around 10.

Thanks, Will Engel

jueyang commented 9 years ago

Hey Will,

When you search 10036 from the database, it returns you the form without being reflected in the url. This means the content you get from requests.get(url) will just return the default page rather than the queried one. (The site is an old jsp website, and probably comes with all the quirks including not having a RESTful url, which causes the problem.)

I suggest saving the page as an html file locally and just use open() instead.

Hope this helps! Sorry it's a bit late. Memorial weekend...