jonaylor89 / FAQ-Scrapper

Web Scrapper for FAQs
0 stars 1 forks source link

Not working for some pages #2

Open HalaKuwatly opened 6 years ago

HalaKuwatly commented 6 years ago

Hey and thanks for the nice work!

Some pages that have a structure like this one: https://www.sskm.de/de/home/onlinebanking/tipps-und-hilfe/fragen_und_antworten/faq-elektronisches-postfach.html?n=true do not work. any idea why? or what can i change to make it work? Thanks

jonaylor89 commented 6 years ago

The page you sent works if you open all of the question tabs but the mechanism for that could change depending on the website so I don't really have a general solution for all websites like that. I was kind of hoping to leave that as a exercise for anyone looking for a challenge to try.

Darrennchan8 commented 6 years ago

Add a bit of code the runs in the puppeteer tab instance before scraping:

Array.from(document.querySelectorAll('*')).filter(e => !['script', 'style', 'link', 'meta', 'embed', 'object'].includes(e.tagName.toLowerCase()) && getComputedStyle(e).display == 'none').forEach(e => e.style.display = 'initial');

On Tue, Jan 16, 2018 at 10:33 PM John Naylor notifications@github.com wrote:

The page you sent works if you open all of the question tabs but the mechanism for that could change depending on the website so I don't really have a general solution for all websites like that. I was kind of hoping to leave that as a exercise for anyone looking for a challenge to try.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/jonaylor89/FAQ-Scrapper/issues/2#issuecomment-358187073, or mute the thread https://github.com/notifications/unsubscribe-auth/AM51XW3OkJFJWWwNcaTnItPjT_626Ukdks5tLWoPgaJpZM4ReqQj .

--

Cheers,

Darren Chan Full Stack Web/Applications Developer, VCU https://www.ts.vcu.edu/ 804-295-8945 <(804)%20295-8945>