jaimeiniesta / metainspector

Ruby gem for web scraping purposes. It scrapes a given URL, and returns you its title, meta description, meta keywords, links, images...
https://github.com/metainspector/metainspector
MIT License
1.02k stars 164 forks source link

Newlines in titles #9

Closed rromanchuk closed 12 years ago

rromanchuk commented 12 years ago

This might be controversial, but it might be nice to strip any newlines inside Scraper#title

Carol Bartz exclusive: Yahoo "f---ed me over" - Postcards

page.title

=> "\n\t\t Carol Bartz exclusive: Yahoo \"f---ed me over\" - \n\t\tPostcards\t"

Ran into above at http://postcards.blogs.fortune.cnn.com/2011/09/08/carol-bartz-fired-yahoo/ We could probably do the callee a favor by cleaning up extra markup which is usually not expected for a title.

LMK and I'll check it in..and add the rest of the missing tests

jaimeiniesta commented 12 years ago

Sounds fine to me.

I think we should respect html markup, but not newlines or tabs.

jaimeiniesta commented 12 years ago

Fixed on this commit

https://github.com/jaimeiniesta/metainspector/commit/8d5a1def312001d4a1d0eec750620795145580a1