matthewmueller / x-ray

The next web scraper. See through the <html> noise.
MIT License
5.87k stars 349 forks source link

How can i get table heading and Data #326

Closed sklyerking closed 5 years ago

sklyerking commented 5 years ago

x('https://en.wikipedia.org/wiki/List_of_Prime_Ministers_of_India', { title: 'title', items: x('.wikitable tr', [{ title: 'th' }]) })(function(err, obj) { console.log(obj); })

Current Output : { title: 'List of Prime Ministers of India - Wikipedia', items: [ { title: '№\n' }, { title: '1\n' }, { title: '–\n' }, { title: '2\n' }, { title: '–\n' }, { title: '3\n' }, { title: '4\n' }, { title: '5\n' }, { title: '(3)\n' }, { title: '6\n' }, { title: '7\n' }, { title: '8\n' }, { title: '9\n' }, { title: '10\n' }, { title: '11\n' }, { title: '12\n' }, { title: '(10)\n' }, { title: '13\n' }, { title: '14\n' } ] }

lathropd commented 5 years ago

You have a couple issues...

You can get all the th contents by putting brackets around the ‘th’.

You’re also looping through every row. Fix that by omitting the brackets around the data object in the items call.

Creating objects with the appropriate keys and values based on the headers is slightly more complex, but not much.

(You’ve probably solved this or given up, but hopefully this will help others.)