lgraubner / sitemap-generator

Easily create XML sitemaps for your website.
MIT License
413 stars 130 forks source link

Change emit error instead of exiting on failing when adding scan subdomains set #76

Closed genslein closed 4 years ago

genslein commented 4 years ago

Emit error instead of critically failing when adding scan subdomains to crawler. We have multiple subdomains across multiple apps that we want crawled but this fails and halts preventing the crawler from traversing all possible pages.

Example:

const SitemapGenerator = require('sitemap-generator');
const chalk = require('chalk');

const options = {
    changeFreq: 'daily',
    respectRobotsTxt: true,
    lastMod: true,
    stripQuerystring: true,
    allowInitialDomainChange: true,
    filepath: 'sitemap.xml',
    maxEntriesPerFile: 50000,
    maxDepth: 0,
    maxConcurrency: 5,
    priorityMap: [],
    userAgent: 'Node/SitemapGenerator',
    ignoreInvalidSSL: true,
    timeout: 30000,
    decodeResponses: true,
    ignoreAMP: true,
    ignore: null,
    scanSubdomains: true, // scan subdomains broken in the sitemap-generator npm library for shop and corporate
};

// create generator
const generator = SitemapGenerator('https://www.soul-cycle.com', options);

yields

node_modules/sitemap-generator/src/index.js:88
      throw new Error(`Site "${parsedUrl.href}" could not be found.`);
      ^

Error: Site "https://www.soul-cycle.com" could not be found.
    at Crawler.<anonymous> (/Users/genslein/node_modules/sitemap-generator/src/index.js:88:13)
    at Crawler.emit (events.js:327:22)
    at /Users/genslein/node_modules/simplecrawler/lib/crawler.js:1282:25
    at FetchQueue.update (/Users/genslein/node_modules/simplecrawler/lib/queue.js:227:9)
    at ClientRequest.<anonymous> (/Users/genslein/node_modules/simplecrawler/lib/crawler.js:1265:27)
    at ClientRequest.emit (events.js:315:20)
    at TLSSocket.socketErrorListener (_http_client.js:432:9)
    at TLSSocket.emit (events.js:315:20)
    at emitErrorNT (internal/streams/destroy.js:84:8)
    at processTicksAndRejections (internal/process/task_queues.js:84:21)