pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
42.57k stars 17.56k forks source link

DOC: Update warning message in pandas.eval function #59108

Closed eilonc-cx closed 3 days ago

eilonc-cx commented 4 days ago

Summary

This pull request updates the pandas.eval documentation to add a security warning about the risks of arbitrary code execution when using the function with untrusted data. This update aims to enhance user awareness and security practices.

Background

The need for this documentation update was identified by Duarte Santos from Checkmarx's Research Group. A vulnerability was discovered that allows for arbitrary code execution through the misuse of pandas.eval with untrusted inputs.

Proposed Change

Location: pandas.eval documentation Update: Insert a warning advising users against the use of pandas.eval with untrusted data, highlighting the potential for arbitrary code execution.

Warning Text:

Warning: Use pandas.eval only with trusted data. This function can execute arbitrary code if used with untrusted inputs, similar to the risks associated with Python's pickle module documentation.

Rationale

This documentation update is crucial for preventing security issues by making users aware of the risks associated with dynamic expression evaluation in pandas.eval. The update follows a preliminary discussion with the Pandas security team and is now presented for broader community feedback.

Thank you for considering this update to enhance the safety and integrity of code using Pandas.

Regards, Eilon Cohen Security Analyst, Checkmarx

eilonc-cx commented 4 days ago

Hi @mroeschke

Thank you for suggesting the warning change for the eval function. Considering the potential risks with its use, I think a more "aggressive" warning might better communicate the severity to users. What do you think?

mroeschke commented 4 days ago

What do you think?

IMO it's preferable to have succinct messaging since this docstring is already long. I suggested to fold in "untrusted data" since that was net new information that could be helpful here.

mroeschke commented 3 days ago

Thanks @eilonc-cx