Closed tdowd closed 7 years ago
Thanks Ted. This change is great! I just have a couple of comments. I hope they're not to nit picky.
Going through some of the result. I see sometimes only the bedrooms are shown, and sometimes only the area is shown. For example:
Oct 17 2nd Floor 1BR/1BA - (3070 San Bruno Ave #9) $2100 { 1br - } (visitacion valley) pic map ... Oct 17 Spacious Studio x 1 Bath (BH) $1895 { 453ft2 - } (sunnyvale) pic map
* Where { ... } represents the <span id="housing">
.
Also, for posting outside of the US, the area is described in m2, instead of ft2:
Appartment 2 bedrooms Brussel €600 { 2br - 69m2 - } (Brussel) pic map
So a couple of suggestions:
Also, it didn't feel right that this code was part of CraigslistBase
since it was specific to housing. So in 212588b
I've added a new method (customize_result()
), for a subclass to add/delete/alter each result. In this case, adding 'bedrooms' + 'area'.
Thanks for the feedback - didn't even think of the international aspect. I'll go back to the lab and get back to you in a bit.
this works for me:
def get_only_first_or_none(lst):
if len(lst) > 1: raise ValueError("too many values")
return lst[0] if len(lst) else None
housing_el = get_only_first_or_none(post.cssselect("span > span > span.housing"))
housing = [x.strip() for x in housing_el.text.split("-\n") if x.strip()] if housing_el is not None else []
bedrooms_raw = get_only_first_or_none([x for x in housing if "br" in x])
area_raw = get_only_first_or_none([x for x in housing if "ft" in x])
num_bedrooms = int(bedrooms_raw.replace("br", "")) if bedrooms_raw else None
area = int(area_raw.replace("ft", "")) if area_raw else None
I've added this in 7247eb9. Here's an example of it working:
In [1]: from craigslist import CraigslistHousing
In [2]: cl_h = CraigslistHousing(site='sfbay', area='sfc', category='apa', filters={'min_price': 2000, 'min_bedrooms': 2})
In [3]: list(cl_h.get_results(limit=1))
Out[3]:
[{'area': u'3100ft2',
'bedrooms': u'5',
'datetime': u'2017-06-13 00:12',
'geotag': None,
'has_image': True,
'has_map': True,
'id': u'6166644593',
'name': u'Gorgeous 3100+ sq ft 5BR, 4.5BA Forest Hill home',
'price': u'$8500',
'url': u'http://sfbay.craigslist.org/sfc/apa/6166644593.html',
'where': u'west portal / forest hill'}]
i added a few lines to parse the square feet and number of bedrooms which is commonly added at the end of the listings (separated by a '-').