niklasb / dryscrape

[not actively maintained] A lightweight Python library that uses Webkit to enable easy scraping of dynamic, Javascript-heavy web pages
http://dryscrape.readthedocs.io/
MIT License
533 stars 67 forks source link

{"class":"NodeNotAttachedError","message":"Element at 2 no longer present in the DOM"} #72

Open ruhman opened 7 years ago

ruhman commented 7 years ago

So, I have a website that I'm trying to scrape and it requires login. Unfortunately it doesn't seem to use cookies for login so opening multiple sessions won't work. Anyway, it works as a kind of online file system in that there are multiple layers to go through. I currently have 5 nested for loops (all require getting href from an xpath with multiple matches) to go through the files. Inside of each I do some processing and access more URLs from the same session. Problem is, lets say after returning to layer 3 from the last layer when it loops for the second time I get an error when trying to "course.get_attr("href")" saying it is no longer in DOM. The for statement is course in session.xpath("//div[@id='_26_1termCourses_noterm']/ul/li/a"): So I imagine it may be some sort of timeout bug, since if no fors are nested and no processing is done a loop like that works normally to extract all links matching the xpath from the page. Any ideas? Thanks!