hgouveia / node-downloader-helper

A simple http file downloader for node.js
MIT License
253 stars 54 forks source link

Filenames parsed from the `Content-Disposition` headers are incorrect. #75

Closed paul-jarrow closed 2 years ago

paul-jarrow commented 2 years ago

Hi there. We've recently identified an issue with this library when using the downloader alongside the JFrog Artifactory platform. This doesn't affect only this platform -- it could affect many platforms that use the aforementioned header.

If no fileName option is provided to node-downloader-helper, the helper will attempt to parse the file name from the Content-Disposition header. If the header is present, which it is for files downloaded from Artifactory, it can contain more than what the script expects.

This gives the file a mangled name. Incidentally, this can often pass on macOS/Unix, but will cause the download to fail on Windows.

As you can see in this function:

https://github.com/hgouveia/node-downloader-helper/blob/33aab440ce6d5bf996ffec07fbadad61862811c6/src/index.js#L658-L683

on this line: https://github.com/hgouveia/node-downloader-helper/blob/33aab440ce6d5bf996ffec07fbadad61862811c6/src/index.js#L667

The entirety of the header is included in this substring. There can be multiple properties in this header, as per spec. I think this header parsing needs to be done a little differently, perhaps with some sort of token splitting.

I have an example script I've been toying with that might have a potential solution:

const headers = {
  'content-disposition': `attachment; filename="Setup64.exe"; filename*=UTF-8''Setup64.exe`
};

const parseHeaderValue = (headerValue) => {
  let headerDictionary = headerValue.split(';');
  headerDictionary = headerDictionary.reduce((acc, val) => {
    const splitVal = val.trim().split('=');
    return { ...acc, [splitVal[0]]: splitVal[1] ?? '' };
  }, {});
  return headerDictionary;
}

const contentDispositionValues = parseHeaderValue(headers['content-disposition']);

if(headers.hasOwnProperty('content-disposition') &&
            contentDispositionValues['filename']) {
  let fileName = headers['content-disposition'];

  fileName = fileName.trim();

  fileName = contentDispositionValues['filename'];

  fileName = fileName.replace(new RegExp('"', 'g'), '');

  fileName = fileName.replace(/[/\\]/g, '');

  console.log(fileName);
}

In our codebase, we solved this problem by always passing in a fileName optional property when constructing the DownloaderHelper, but I'm hoping this will help eliminate a frustrating edge case we experienced. Thank you :)

Chaphasilor commented 2 years ago

Hi there, thanks for the detailed breakdown of the issue and the potential solution!

I originally implemented the parsing and it seems like I overlooked the filename*= section...

It looks like that fix really isn't too difficult, so I should be able to implement it soon :)

paul-jarrow commented 2 years ago

Looks like the fix has been merged, thank you all!

hgouveia commented 2 years ago

thanks for the report @paul-jarrow ! i will publish new version after @Chaphasilor submit the new PR for upgrade to newer node version and remove the legacyURL

Chaphasilor commented 2 years ago

Nah, don't wait for it, I don't have much time right now, and this PR has some important fixes :)

hgouveia commented 2 years ago

Released on 2.0.1