So, I have a website that I'm trying to scrape and it requires login. Unfortunately it doesn't seem to use cookies for login so opening multiple sessions won't work.
Anyway, it works as a kind of online file system in that there are multiple layers to go through. I currently have 5 nested for loops (all require getting href from an xpath with multiple matches) to go through the files. Inside of each I do some processing and access more URLs from the same session. Problem is, lets say after returning to layer 3 from the last layer when it loops for the second time I get an error when trying to "course.get_attr("href")" saying it is no longer in DOM.
The for statement is course in session.xpath("//div[@id='_26_1termCourses_noterm']/ul/li/a"):
So I imagine it may be some sort of timeout bug, since if no fors are nested and no processing is done a loop like that works normally to extract all links matching the xpath from the page.
Any ideas?
Thanks!
So, I have a website that I'm trying to scrape and it requires login. Unfortunately it doesn't seem to use cookies for login so opening multiple sessions won't work. Anyway, it works as a kind of online file system in that there are multiple layers to go through. I currently have 5 nested for loops (all require getting href from an xpath with multiple matches) to go through the files. Inside of each I do some processing and access more URLs from the same session. Problem is, lets say after returning to layer 3 from the last layer when it loops for the second time I get an error when trying to
"course.get_attr("href")"
saying it is no longer in DOM. The for statement iscourse in session.xpath("//div[@id='_26_1termCourses_noterm']/ul/li/a"):
So I imagine it may be some sort of timeout bug, since if no fors are nested and no processing is done a loop like that works normally to extract all links matching the xpath from the page. Any ideas? Thanks!