juliomalegria / python-craigslist

Simple Craigslist wrapper
MIT No Attribution
387 stars 117 forks source link

adding in features to gather square feet and number of bedrooms #25

Closed tdowd closed 7 years ago

tdowd commented 7 years ago

i added a few lines to parse the square feet and number of bedrooms which is commonly added at the end of the listings (separated by a '-').

juliomalegria commented 7 years ago

Thanks Ted. This change is great! I just have a couple of comments. I hope they're not to nit picky.

Going through some of the result. I see sometimes only the bedrooms are shown, and sometimes only the area is shown. For example:

Oct 17 2nd Floor 1BR/1BA - (3070 San Bruno Ave #9) $2100 { 1br - } (visitacion valley) pic map ... Oct 17 Spacious Studio x 1 Bath (BH) $1895 { 453ft2 - } (sunnyvale) pic map

* Where { ... } represents the <span id="housing">.

Also, for posting outside of the US, the area is described in m2, instead of ft2:

Appartment 2 bedrooms Brussel €600 { 2br - 69m2 - } (Brussel) pic map

So a couple of suggestions:

juliomalegria commented 7 years ago

Also, it didn't feel right that this code was part of CraigslistBase since it was specific to housing. So in 212588b I've added a new method (customize_result()), for a subclass to add/delete/alter each result. In this case, adding 'bedrooms' + 'area'.

tdowd commented 7 years ago

Thanks for the feedback - didn't even think of the international aspect. I'll go back to the lab and get back to you in a bit.

AlJohri commented 7 years ago

this works for me:

def get_only_first_or_none(lst):
    if len(lst) > 1: raise ValueError("too many values")
    return lst[0] if len(lst) else None

housing_el = get_only_first_or_none(post.cssselect("span > span > span.housing"))
housing = [x.strip() for x in housing_el.text.split("-\n") if x.strip()] if housing_el is not None else []
bedrooms_raw = get_only_first_or_none([x for x in housing if "br" in x])
area_raw = get_only_first_or_none([x for x in housing if "ft" in x])

num_bedrooms = int(bedrooms_raw.replace("br", "")) if bedrooms_raw else None
area = int(area_raw.replace("ft", "")) if area_raw else None
juliomalegria commented 7 years ago

I've added this in 7247eb9. Here's an example of it working:

In [1]: from craigslist import CraigslistHousing

In [2]: cl_h = CraigslistHousing(site='sfbay', area='sfc', category='apa', filters={'min_price': 2000, 'min_bedrooms': 2})

In [3]: list(cl_h.get_results(limit=1))
Out[3]: 
[{'area': u'3100ft2',
  'bedrooms': u'5',
  'datetime': u'2017-06-13 00:12',
  'geotag': None,
  'has_image': True,
  'has_map': True,
  'id': u'6166644593',
  'name': u'Gorgeous 3100+ sq ft  5BR, 4.5BA Forest Hill home',
  'price': u'$8500',
  'url': u'http://sfbay.craigslist.org/sfc/apa/6166644593.html',
  'where': u'west portal / forest hill'}]