jaimeiniesta / metainspector

Ruby gem for web scraping purposes. It scrapes a given URL, and returns you its title, meta description, meta keywords, links, images...
https://github.com/metainspector/metainspector
MIT License
1.03k stars 165 forks source link

Metainspector not able to fetch basic title or meta elements. #230

Closed malav1410 closed 5 years ago

malav1410 commented 6 years ago

I got some issue while fetching data from simple pages like articles from linkedin. I found title and fb meta tags and twitter meta tags in site's header file but metainspector unable to fetch any of them.

Link: https://www.linkedin.com/pulse/20140830162720-155900101-why-do-professional-athletes-need-a-coach

jaimeiniesta commented 5 years ago

I can verify MetaInspector is not able to scrape much from LinkedIn:

https://metainspectordemo.herokuapp.com/scrape?url=https%3A%2F%2Fwww.linkedin.com%2Fpulse%2F20140830162720-155900101-why-do-professional-athletes-need-a-coach

This seems to be due to how the HTML code in there has been mangled up, probably looking to make it more difficult for scrapers, and to be reconstructed later on using JavaScript.

I'll be happy to review a PR to improve this if you see a clear path of action.