michaelpumo / ScrapeCFC

⚡️ A CFC that scrapes information from a given URL.
5 stars 6 forks source link

Only returns part of the page #1

Closed RHMason closed 5 years ago

RHMason commented 11 years ago

I tried ScrapeCFC last night (with Railo 4 on Windows) and decided to test it on your project’s page on GitHub. What I found however was that it only brought back part of the page:

{"mimetype":"text\/html","title":"michaelpumo\/ScrapeCFC \u00b7 GitHub","errors":false,"images":[{"height":"20","alt":"","width":"20","url":"https:\/\/secure.gravatar.com\/avatar\/1adc2b5e1b5d40b03455e544ae417132?s=140&d=https:\/\/a248.e.akamai.net\/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-user-420.png"},{"height":"64","alt":"","width":"64","url":"https:\/\/a248.e.akamai.net\/assets.github.com\/images\/spinners\/octocat-spinner-128.gif?1347543527"}],"messages":[],"og":{"og:type":"githubog:gitrepository","og:title":"ScrapeCFC","og:description":"ScrapeCFC - A CFC that scrapes information from a given URL.","og:site_name":"GitHub","og:image":"https:\/\/secure.gravatar.com\/avatar\/1adc2b5e1b5d40b03455e544ae417132?s=420&d=https:\/\/a248.e.akamai.net\/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-user-420.png","og:url":"https:\/\/github.com\/michaelpumo\/ScrapeCFC"},"meta":{"octolytics-app-id":"github","description":"ScrapeCFC - A CFC that scrapes information from a given URL.","msapplication-tileimage":"\/windows-tile.png","octolytics-dimension-repository_id":"10200328","octolytics-dimension-user_id":"4269460","csrf-token":"wbKdSTAc4WajQJ+ixBY8yZkLDXuJrJCDu4RffGBE2J8=","csrf-param":"authenticity_token","octolytics-host":"collector.githubapp.com","msapplication-tilecolor":"#ffffff"},"url":"https:\/\/github.com\/michaelpumo\/ScrapeCFC"}

Any idea why its only returning selective parts and not all the data?

michaelpumo commented 11 years ago

Hello! Thanks for the feedback. At this moment in time, it only returns some meta information about the page (some meta tag values and open graph values) and any available images. This suited my application at the time of open sourcing this script.

What other kind of information did you hope to be seeing?

RHMason commented 11 years ago

Perhaps it should be billed as a meta scraper then? Most people if they want to scrape the page they want all the information on the page. It is hard to predict where in the page what you want is located. Is there an easy way to modify it so it scrapes the entire page? I would think it would be much more useful to the majority of people looking for a CFML scraper if it had this feature.