Is it possible to get the raw HTML from a Session or Node?

niklasb / dryscrape

[not actively maintained] A lightweight Python library that uses Webkit to enable easy scraping of dynamic, Javascript-heavy web pages

http://dryscrape.readthedocs.io/

MIT License

533 stars 67 forks source link

Is it possible to get the raw HTML from a Session or Node? #17

Closed arne-cl closed 11 years ago

arne-cl commented 11 years ago

Dear Niklas,

I am trying to parse parts of a weirdly formatted website, where .at_xpath() and .at_css() don't help much. Is it somehow possible to retrieve the raw HTML that a Node or Session instance represent?

Kind regards, Arne

niklasb commented 11 years ago

Hello Arne,

if I am not mistaken, you should be able to use session.body() to get the HTML and session.document() to get a parsed version of the document (using the lxml library).

Greetings, Niklas