prolificinteractive / node-html-to-json

Parses HTML strings into objects using flexible, composable filters.
MIT License
120 stars 13 forks source link

How to get all 'href' in a webpage #1

Closed silverfangs closed 8 years ago

silverfangs commented 8 years ago

Hi guys, I have problem getting all the 'href' attr in my own website, my coding as below:

var foo = htmlToJson.request('http://www.mywebsite.com/', {
  'links': function ($doc) {
    return $doc.find('a').attr('href');
  }
}, function (err, result) {
  console.log(result);
});

And the result I got only one link: { links: 'http://link1.com'}

But my desired outcome: { links: 'http://link1.com'} { links: 'http://link2.com'} { links: 'http://link3.com'}

I can't get it, please help me. Thanks.

prolificeric commented 8 years ago

You just need to tell the parser that you're iterating through a series of elements and mapping data from them into an array. There are two ways to do it:

The shorthand way:

var foo = htmlToJson.request('http://www.mywebsite.com/', {
  'links': ['a', function ($a) {
    return $a.attr('href');
  }]
}, function (err, result) {
  console.log(result);
});

And the longer .map way of doing it:

var foo = htmlToJson.request('http://www.mywebsite.com/', {
  'links': function ($doc) {
    return this.map('a', function ($a) {
      return $a.attr('href');
    });
  }
}, function (err, result) {
  console.log(result);
});
silverfangs commented 8 years ago

@prolificeric , yeah! Cool man, I get it now, thank you so much for your help!