Open endolith opened 1 year ago
New URLs probably need something like this:
relativeURL = '/area/106316122/hawaii'
start_urls = [domain + relativeURL]
allowed_domains = ['mountainproject.com']
rules = [
Rule(
LinkExtractor(allow='area/(.+)'),
callback='parse',
follow=True
)
]
New state pages have
<div class="col-md-3 left-nav float-md-left mb-2">
<div class="mp-sidebar">
So probably links = response.css('.left-nav a::attr(href)').extract()
?
And on the main page it has
<div class="col-xs-12">
<div class="title-with-border-bottom mb-2">
<h2 class="inline-block mr-half">Rock Climbing Guide</h2>
</div>
<div class="row" id="route-guide">
So probably links = response.css('div#route-guide a::attr(href)').extract()
?
Still doesn't work, though.
DEBUG: Filtered offsite request to 'www.mountainproject.comhttps': <GET https://www.mountainproject.comhttps//www.mountainproject.com/map/106316122/hawaii>
yield scrapy.Request(url, callback=self.parse_coordinates)
I'm not sure why the original code says this:
if 'Location' not in response.css('#rspCol800 div.rspCol table tr:nth-child(2) td ::text').extract()[0]:
return response.css('#rspCol800 div.rspCol table tr:nth-child(3) td ::text').extract()[1].strip()
else:
return response.css('#rspCol800 div.rspCol table tr:nth-child(2) td ::text').extract()[1].strip()
In the case that it doesn't list Location:
, then what is it getting instead?
for example.
(Now in the new layout it's "GPS:", though.)
(I've got it working, but I made a bunch of clunky changes with the help of ChatGPT that I don't fully understand)
The /v/ URLs redirect to a new scheme:
<div id="viewerLeftNavColContent" class="rspCollapsedContent">
was present in old pages: https://web.archive.org/web/20161122233413/http://www.mountainproject.com/v/alabama/105905173but no longer.
<span class="destArea">
was present on old homepage:https://web.archive.org/web/20171016232313/https://www.mountainproject.com/
but no longer.