Open mulysatest opened 7 years ago
@kba Any idea?
Hi, thanks for the interest. @zuphilip made some very helpful posts in the wiki that could help you get started.
@kba Thank you for your quick respond, does it possible to include the img to the content?
Try changing the xpathDescription
to e.g. .//img
. Though I think they lazy-load the images with javascript, so the image path is in data-src
not src
so browsers won't display it.
I tried that already with .//img[@class="swap-image lazy-loaded"] and it got this error Could not scrape description, check your xpath
I think that class is added by the browser. rssscrpr will not execute any Javascript. Look at the source code of the HTML page as it is delivered to your browser, (ctrl-u instead of "inspect element" or curl <url>
on the command line. The src
attribute doesn't contain a reference to the real image but to some placeholder. I'm afraid this is not possible with pure Xpath, you'd need some postprocessing step for setting the data-src
attribute value as the src
.
First of all thank you for your great script. I tried the demo and learn the syntax from wiki to extract the content from html to be RSS. However I think it syntax is really difficult to understand. I can't managed to extract proper content etc .. Can you give me some demo on how to extact this RSS from this page: http://www.popularmechanics.com/search/drone?
Thank you in advance.