Open AndyMik90 opened 2 weeks ago
Hey @AndyMik90 I just checked and it seems like valma.ai
uses a wordpress plugin that renders all images as a svg file before displaying the actual image. This probably affects SEO/performance metrics.
What I suggest is using the parameter { pageOptions: { waitFor: 1000 } }
in your request, so the scraper will wait for the pages to fully render before extracting the data.
Hey @AndyMik90 I just checked and it seems like
valma.ai
uses a wordpress plugin that renders all images as a svg file before displaying the actual image. This probably affects SEO/performance metrics.What I suggest is using the parameter
{ pageOptions: { waitFor: 1000 } }
in your request, so the scraper will wait for the pages to fully render before extracting the data.
Thanks. I will try it, but I'm afraid it may be because we use more advanced speed optimization techniques, like delayed Javascript execution. Basically, we would need a user interaction (click, scroll, etc.) to trigger the JS to load.
This is great for speed but not for scraping.
@AndyMik90 Awesome! If you need to scroll a specific component in html, you can use the { pageOptions: { scrollXPaths: string[] } }
with the component's XPath
. We haven't implemented a way to click yet, but we can consider adding it if it makes sense.
Describe the Bug When scraping sites, like valma.ai: we only get images back with
(data:image/svg+xml,%3Csvg%20xmlns='http://www.w3.org/2000/svg'%20viewBox='0%200%20768%20768'%3E%3C/svg%3E)
To Reproduce Steps to reproduce the issue: Scrape valma.aiAdditional information: The only image that shows with correct link is the logo with .webp format others that are .png etc. does only show as data:image/svg+xml