toby-p / rightmove_webscraper.py

Python class to scrape data from rightmove.co.uk and return listings in a pandas DataFrame object
MIT License
253 stars 113 forks source link

2 Questions from Python / Panda Noob #45

Closed haar-making closed 1 year ago

haar-making commented 2 years ago

Hi,

Apologies if the formatting is wrong - I'm new to GitHub but have tried to follow the guidelines

2 potentially stupid questions from a Python noob who's also trying to get to grips with Panda.

Question 1: Getting the floorplan from the tree uses the following:

xp_floorplan_url = """//*[@id="floorplanTabs"]/div[2]/div[2]/img/@src"""
floorplan_url = tree.xpath(xp_floorplan_url)

However, in the page source for a sample search (4 results to keep it small) and then the individual property page for one of the results there is no "floorplanTabs".

When I inspect the page in Chrome I can't find "floorplanTabs" either.

Can you explain how this works?

Question 2: What does this mean /div[2]/div[2] in the line below?

xp_floorplan_url = """//*[@id="floorplanTabs"]/div[2]/div[2]/img/@src"""

Many thanks for your help.

sm17977 commented 2 years ago

I think maybe Rightmove have changed the structure of their website since the script was written. You can find the current XPath of the floorplan image element using your web browser. The XPath expression: '/div[2]/div[2]' selects the second div child of its parent div, which also happens to be the second div child of its own parent. Look up XPath expressions if you're still confused, they're easy to learn

toby-p commented 1 year ago

Duplicate