wikimedia / html-metadata

MetaData html scraper and parser for Node.js (supports Promises and callback style)
MIT License
138 stars 44 forks source link

html-metadata returns all fields as undefined for specific url #54

Closed satyanath closed 6 years ago

satyanath commented 7 years ago

Hi,

I have been using html-metadata and thanks for such a wonderful software. I noticed that when used on the following url https://www.cnet.com/special-reports/vr101/ - it gives all fields as undefined.

Can you please look into the issue.

My code is

var scrape = require('html-metadata'); var url = process.argv[2];

scrape(url).then(function(metadata){ console.log("****"); console.log(metadata); });

and the output I get for this program is

parse() is deprecated, use toJson()


{ openGraph: { site_name: undefined, title: undefined, description: undefined, url: undefined, image: { url: undefined, type: 'image/jpeg', width: '630', height: '315' }, app_id: undefined, type: 'article' }, twitter: { card: 'summary_large_image', creator: undefined, site: undefined } }

mvolz commented 7 years ago

Thanks for reporting!

The underlying issue with that url is that it looks like they're using a templating language to create their html but the values for some reason the content tag isn't being added:

<!-- OpenGraph sharing tags -->
    <meta property="og:site_name"     ng-attr-content="{{share.siteName}}" />
    <meta property="og:title"         ng-attr-content="{{share.title}}" />
    <meta property="og:description"   ng-attr-content="{{share.description}}" />

But obviously we shouldn't be returning undefined! :)

satyanath commented 7 years ago

Yes. So what would be returning and what time it may take to fix this?.

Regards,

S.Satyanath

On 06-02-2017 15:29, M. Volz wrote:

Thanks for reporting!

The underlying issue with that url is that it looks like they're using a templating language to create their html but the values for some reason aren't getting replaced and it's putting the literal template tags in instead:

| <meta property="og:site_name" ng-attr-content="{{share.siteName}}" /> <meta property="og:title" ng-attr-content="{{share.title}}" /> <meta property="og:description" ng-attr-content="{{share.description}}" /> |

But obviously we shouldn't be returning undefined! :)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/wikimedia/html-metadata/issues/54#issuecomment-277635795, or mute the thread https://github.com/notifications/unsubscribe-auth/AFQiA6SEM30g3D4ng_9rFWrrOdaJQmCuks5rZu7ogaJpZM4LzrMv.

mvolz commented 7 years ago

Hi,

I will fix it returning undefined, and instead you will get cleaner, but not very rich metadata. It looks like they have a programming error on their end :/ I'll let you know when the update is published.

satyanath commented 7 years ago

Thanks a lot for the quick response.

Regards,

S.Satyanath

On 06-02-2017 15:43, M. Volz wrote:

Hi,

I will fix it returning undefined, and instead you will get cleaner, but not very rich metadata. It looks like they have a programming error on their end :/ I'll let you know when the update is published.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/wikimedia/html-metadata/issues/54#issuecomment-277639168, or mute the thread https://github.com/notifications/unsubscribe-auth/AFQiA6KsWBW2zv8fSA5J2cyk4Q5fmarbks5rZvI4gaJpZM4LzrMv.

mvolz commented 6 years ago

I think this has actually been resolved for a while, so closing :).