mholt / PapaParse

Fast and powerful CSV (delimited text) parser that gracefully handles large files and malformed input
http://PapaParse.com
MIT License
12.5k stars 1.15k forks source link

Protection against CSV injection attacks #793

Closed wcerfgba closed 4 years ago

wcerfgba commented 4 years ago

When importing a CSV file, Microsoft Excel and LibreOffice Calc will both interpret cells beginning with a = as formulae, which can lead to attacks that can result in data exfiltration or arbitrary command execution. [1] This is easily remedied by prefixing cells that begin with =, +, - or @ with a ' in order to suppress automatic interpretation of formulae by these softwares. [2]

I would like to propose an option escapeFormulae for Papa.unparse to provide this prefixing behaviour.

Thanks!

[1] https://owasp.org/www-community/attacks/CSV_Injection [2] https://www.contextis.com/en/blog/comma-separated-vulnerabilities

pokoli commented 4 years ago

That make sense for me.

I'm wondering if we should activate this option by default or not.

wcerfgba commented 4 years ago

I'm wondering if we should activate this option by default or not.

I guess this is dependent on if you have users who currently rely on formulae interpretation and for whom this would break on an upgrade. My intuition is that this would be a minority of users and that for most use cases, it would be better to be safe by default, and have escapeFormulae default to true.

pokoli commented 4 years ago

Having a safer default sounds good to me. Then we just need to have a proper documentation to explain why the should should be desactivated and include a warning about the it's safety risks.

So we just need an implmentation for it.

dylanlingelbach commented 4 years ago

Having a safer default sounds good to me

I'm sorry but this doesn't make sense to me. papaparse is a CSV parser. By default modifying the contents of a CSV file that may or may not ever get processed by Excel/LibreOffice seems like very odd behavior.

This bug is reported as a high severity vulnerability which also doesn't make sense to me - by that logic every text editor has the same vulnerability

pokoli commented 4 years ago

@dylanlingelbach not all contents are modified but only the ones that are suceptible of having a formula, just the ones that start with = which from my understanding should not me the bast majority.

But I agree that this should not be treated as high severty (at least on Papaparse) because:

P.S: My main usage of papaparse is to generate files to be opened by a SpreadSheet software, so probably I'm a little bit biaged here. I will like to have more opinion on this.

@mholt @dboskovic what do you think it should be the defalut value of this new parameter?

asafbiton commented 4 years ago

Hi there! 👋

My name is Asaf and I'm part of the Snyk Security Team. We have been tracking this issue for a few days now, and an advisory has been mistakenly published. I tend to agree with all the above arguments and do not believe there is a vuln within the context of papaparse.

I have therefore revoked this advisory from our database. I apologize for any inconvenience caused by this.

For further inquiries please don't hesitate to contact us at report@snyk.io or using the vulnerability disclosure form.

pokoli commented 4 years ago

@asafbiton Thank you so much!

dylanlingelbach commented 4 years ago

@asafbiton thank you!

@pokoli I think if any contents of the CSV file are modified by default that is odd (CSV quoting/escaping excluded)

wcerfgba commented 4 years ago

Sorry I have been away from the thread for a few days. Would it be fair to say the consensus is that this option is a good idea, but it should be disabled by default? I am happy to write a patch some point this week if we are aligned on this. :)

shawn-eary commented 2 years ago

@dylanlingelbach

I'm sorry but this doesn't make sense to me. papaparse is a CSV parser. By default modifying the contents of a CSV file that may or may not ever get processed by Excel/LibreOffice seems like very odd behavior.

I agree many people don't give a rip if a file processed by papaparse contains Macro, Command or SQL injection attacks, but some users might appreciate a PapaParse "plugin" that will alert them if the parsed file seems suspicious. If a probable attack is found, it might be useful for the PapaParse "plugin" to state the line and/or char number where the suspect text is.

dylanlingelbach commented 2 years ago

some users might appreciate a PapaParse "plugin" that will alert them if the parsed file seems suspicious

Agreed there! Modifying the data just shouldn't be default behavior for a CSV parser

shawn-eary commented 2 years ago

@dylanlingelbach

Agreed there! Modifying the data just shouldn't be default behavior for a CSV parser

How about overloading the Parse function to accept the following optional arguments?:

  1. User defined function that runs on each field and returns true if the user thinks the field contains suspicious patterns
  2. User defined function that runs a transform on each field

PapaParse could then provide standard functions that the user could chose from to pass in for Item 1 and Item 2 above. If the user passes in nothing, PapaParse would behave the say way it has for years.

NOTE: About Item 1 above, if a function is passed in for this parameter and that function returns true, PapaParse would then alert the user with an appropriate warning so she/he can begin mitigation.

mholt commented 2 years ago

I think interpretation of the data is out of scope for a CSV parser and lends itself to more security problems, not less.

shawn-eary commented 2 years ago

I apologize, PapaParse already supports a transform function. I'm sorry I didn't see this in the manual... The already supported transform function not only allows you to remove risky values from CSV cells, but it also allows you to identify them if you don't mind using "global" variable style programming:

  var globalRisksFound = []; 
  function isRisky(v, cName) {
    // Grossly over conservative
    var isAlphaNumeric = v.match(/^ *[A-Za-z0-9 ]+ *$/)
    if (!isAlphaNumeric) {
      globalRisksFound.push(
        {
          col: cName, 
          val: v
        }
      );
    }
    // Don't do a transform. Just report the error
    return v;
  }

  let resultingJSON = PapaParse.parse(
    uploadedData,
    {delimiter: ",", header: true, transform: isRisky}
  );

  // The suspect values are now in the globalRisksFound array

I think a cool enhancement someday might be for the transform function to pass the record number via a third parameter.

bean5 commented 2 years ago

Chiming in a bit late. It seems that an interpreter such as Excel, upon loading from a file specifically named CSV but which contains potential formula should be the one prompting about macros. Does it not already do this?

Having the ability to prevent such alerts is still a good idea. I like the fact that escapeFormulae is provided as an option and set to true by default. That maintains backwards-compatibility while providing the feature. Win-win.

Congrats to Snyk team for owning up to the false alarm (given the context), too. That being said, replacing the vuln listing with a 404 rather than a retraction is a bit confusing. Even still, they are doing good work.