stevenvachon / broken-link-checker

Find broken links, missing images, etc within your HTML.
MIT License
1.97k stars 305 forks source link

Alpha version 0.8.0 does not follow found links recursively #183

Closed echo-gravitas closed 4 years ago

echo-gravitas commented 4 years ago

Describe the bug This morning I played around with release version 0.7.8 (yarnpkg). Works fine so far. This afternoon I was curious how alpha version 0.8.0 (github) works. Unfortunately I can't get it to scan through a website recursively. It only logs the URL I defined as staring point in console / node stdout.

This is my index.js:

const {SiteChecker} = require('broken-link-checker');

let options = {
        acceptedSchemes: ['http', 'https'],
        honorRobotExclusions: false,
        cacheResponses: false
    },
    customData = null,
    siteUrl = new URL('https://www.example.com');

const siteChecker = new SiteChecker(options)
    .on('error', (error) => {
    })
    .on('robots', (robots, customData) => {
    })
    .on('html', (tree, robots, response, pageURL, customData) => {
        console.log(pageURL.href)
    })
    .on('queue', () => {
    })
    .on('junk', (result, customData) => {
    })
    .on('link', (result, customData) => {
    })
    .on('page', (error, pageURL, customData) => {
    })
    .on('site', (error, siteURL, customData) => {
        console.log(siteURL.href)
    })
    .on('end', () => {
        console.log('Done!')
    });

siteChecker.enqueue(siteUrl, customData);

To Reproduce

  1. Add broken-link-checker from github via yarn add
  2. Build it via yarn build in node_modules/broken-link-checker
  3. Create an index.js in project root and copy and paste my example mentioned above
  4. Run node index.js in command line

Expected behavior A list of URLs based on the given URL as starting point like this: https://www.example.com https://www.example.com/2017/12/08/kalte-winterdaemmerung-am-rheinfall/ https://www.example.com/author/johndoe/ https://www.example.com/2017/11/12/konzert-kammgarn/ https://www.example.com/2017/10/18/portrait-shooting/ https://www.example.com/2017/10/15/wochenendtrip/ https://www.example.com/2017/10/01/zu-besuch/ https://www.example.com/2017/06/29/gewitterfront/ https://www.example.com/2017/06/15/la-belle-paris/ https://www.example.com/2017/03/13/alvaro-soler/ ...

Environment:

stevenvachon commented 4 years ago

change

acceptedSchemes: ['http', 'https'],

to

acceptedSchemes: ['http:', 'https:'],
stevenvachon commented 4 years ago

Perhaps this should be handled in the options parser to simplify the API.

echo-gravitas commented 4 years ago

One should read the manual carefully… 🤦🏼‍♂️ Thanks for the hint, now it works as expectet.