rbren / rss-parser

A lightweight RSS parser, for Node and the browser
MIT License
1.35k stars 209 forks source link

Socket remains open after a timeout #238

Open alexfernandez opened 1 year ago

alexfernandez commented 1 year ago

Node v18.12.1, rss-parser@3.12.0. I am scraping a number of RSS feeds using rss-parser. A Nasdaq feed is invariably giving a timeout after 10 seconds, which is the configured time:

Could not read https://www.nasdaq.com/feed/rssoutbound?category=Commodities: Request timed out after 10000ms

The feed can be read without issues on a browser so I guess it is limited by user-agent on the server side. Anyway, my program does not terminate properly, and keeps the process open after everything else has been closed. Running it with wtfnode yields the following information:

^C[WTF Node?] open handles:
- File descriptors: (note: stdio always exists)
  - fd 2 (tty) (stdio)
  - fd 1 (tty) (stdio)
- Sockets:
  - 192.168.1.5:41910 -> 23.214.214.41:443

As it happens, 23.214.214.41 is (not coincidentally) the IP address of the Nasdaq Akamai endpoint:

 $ ping www.nasdaq.com
PING e6982.dsca.akamaiedge.net (23.214.214.41) 56(84) bytes of data.

It appears that rss-parser is not closing the socket properly after a timeout. Any ideas?

goldsrc commented 7 months ago

As a workaround fetch the RSS's xml yourself with node's built-in fetch (since v18), axios, etc. Example:

const axios = require('axios');
const RSSParser = require('rss-parser');

const parser = new RSSParser();

async function axiosExample(url) {
  try {
    const response = await axios.get(url);
    const feed = await parser.parseString(response.data);
  } catch (error) {
    console.error("Error fetching the RSS feed:", error);
  }
}

async function fetchExample(url) {
) {
  try {
    // you can also use 'node-fetch' const fetch = require('node-fetch')
    const response = await fetch(url);
    if (!response.ok) {
      throw new Error(`Unexpected response ${response.statusText}`);
    }
    const xml = await response.text();
    const feed = await parser.parseString(xml);
  } catch (error) {
    console.error("Error fetching the RSS feed:", error);
  }
}

I made a PR trying to fix this issue (#264)