ruipgil / scraperjs

A complete and versatile web scraper.
MIT License
3.7k stars 188 forks source link

How to scrape using html files if the site did not declare any "class" #74

Open jhnferraris opened 7 years ago

jhnferraris commented 7 years ago

Hello,

I'm trying to review on my javascript skills here and would like to try out this neat scraper. I have this static website here: http://www.phivolcs.dost.gov.ph/html/update_SOEPD/EQLatest.html, I'm trying to scrape off the 2017 table.

Comparing to HackerNews website, my target site doesn't have any css classes to target which texts to scrape.

Example: screen shot 2017-09-01 at 3 36 49 pm

For starters I tried to do this this way,

var scraperjs = require('scraperjs');

router.get('/bulletin', function(request, response, next){
    scraperjs.StaticScraper.create('http://www.phivolcs.dost.gov.ph/html/update_SOEPD/EQLatest.html')
        .scrape(function($) {
            // This is similar to an inspector on a scrapinghub service.
            return $("html > body > div > table > tbody > tr > td").map(function() {
                console.log($(this));
                return $(this).text();
            }).get();
        })
        .then(function(news) {
            response.send(news);
        })
});

But I can't get any data from the static page. How do can I achieve this?

Thanks for the assist!