Closed westurner closed 1 year ago
pls send me a private email and I'll take a look git log will show my email addess
Resolved. Looks like I needed to update to the latest pandas release. Thanks!
Going forward, for your project, it would be good to have a documented process for fielding security issues.
@westurner I get that you want to raise these types of issues. But not sure that this is a pandas issue at all. It may be that the 'use' of pandas is incorrect, so possibly a doc note is in order as pandas is not directly web-facing.
It may be a healthy objective for the pandas project.
Examples:
...
I get all this, but what can pandas actually do about this?
ahh, you want to make this a doc issue, ok with that.
@westurner ok pull-request for 0.15.1 then!
Document what process for documenting issues and resolution are optimal in a security sensitive context. (e.g. link to a mailing list, or whatever you feel is appropriate)
Here's a good example: https://docs.djangoproject.com/en/dev/internals/security/#reporting-security-issues
https://securitytxt.org/ recommends /.well-known/security.txt
.
@westurner : Seems reasonable. You're more than welcome to open a PR to add this!
This is already added in https://github.com/pandas-dev/pandas/blob/master/.github/SECURITY.md so I think we can close this issue
👍 Duplicate of https://github.com/pandas-dev/pandas/issues/27821
Actually, this still isn't on the docs?
Maybe;
.. include: ../.github/security.md
in the Sphinx docs/Or would that be unhelpful because the Sphinx docs are in RST instead of the - newer - MyST Markdown?
Ah good point @westurner, this is not explicitly called out in the docs. Might be good to add a section in https://pandas.pydata.org/docs/development/policies.html with the security policy. I'll reopen this
Thanks.
From https://github.com/pandas-dev/pandas/security/policy 2023-07 :
To report a security vulnerability to pandas, please go to https://tidelift.com/security and see the instructions there
https://github.com/pandas-dev/pandas/security/advisories lists zero security advisories. Will need to check out how that works; does it feed from OSV?
From https://osv.dev/ :
Data sources This infrastructure serves as an aggregator of vulnerability databases that have adopted the OSV schema, including GitHub Security Advisories, PyPA, RustSec, and Global Security Database, and more. [...] OSV schema All advisories in this database use the OpenSSF OSV format, which was developed in collaboration with open source communities.
The OSV schema provides a human and machine readable data format to describe vulnerabilities in a way that precisely maps to open source package versions or commit hashes.
curl -d \
'{"version": "0.0.0",
"package": {"name": "pandas", "ecosystem": "PyPI"}}' \
"https://api.osv.dev/v1/query"
GitHub Advisory Database > Sources https://github.com/github/advisory-database#sources :
From https://github.com/pypa/advisory-database#readme :
This is community owned repository of advisories for packages published on https://pypi.org.
Advisories live in the vulns directory and use a YAML encoding of a simple format.
Existing entries can be edited by simply creating a pull request.
To introduce a new entry, create a pull request with a new file that has a
name
matching PYSEC-0000-<anything>.yaml
. This will be later picked up by
automation to allocate a proper ID once merged.
Much of the existing set of vulnerabilities are collected from the NVD CVE feed.
We use this tool,
which
performs a lot of heuristics to match CVEs with exact Python packages and
versions (which is a difficult problem!) and a small amount of human triage
to
generate the .yaml
entries here.
Vulnerabilities are integrated into the Open Source Vulnerabilities project, which provides an API to query for vulnerabilities like so:
$ curl -X POST -d \
'{"version": "2.4.1", "package": {"name": "jinja2", "ecosystem":
"PyPI"}}' \
"https://api.osv.dev/v1/query"
Longer term, we are working with the PyPI team to
build a pipeline to
automatically get these vulnerabilities into PyPI. The goal is to
have the pip install
(and an additional pip audit
) command automatically
report vulnerabilities out of the box.
*****
-
- https://www.google.com/search?q=CVE-2020-13091
- pickle vuln in pandas<=1.0.3 due to upstream cpython/python#pickle vuln
- pickle `eval()`s data/**code** and `exec()`s the `__reduce__()`
method, and there's (still?) not (yet?) a pickle protocol to prevent exec
on read
- SQLi: SQL Injection
Perhaps obviously, if you prepare unsafe SQL queries - for example
without use query parameterization;;-- string concatenation - and run them
on a SQL database (with pandas (SQLalchemy) or any other library in any
programming language) there would be SQLi (SQL Injection) vulnerabilities
in your app which depends upon pandas.
- ENH: sql support with SQLAlchemy
https://github.com/pandas-dev/pandas/issues/6292#issuecomment-49088480
(2014)
-
https://github.com/pandas-dev/pandas/blob/main/pandas/tests/io/test_sql.py
- https://pandas.pydata.org/docs/user_guide/io.html#sql-queries
*****
-
https://pandas.pydata.org/docs/user_guide/io.html#general-parsing-configuration
`dtype_backend="pyarrow"`
- https://arrow.apache.org/blog/2022/02/16/introducing-arrow-flight-sql/
- Arrow Flight SQL is faster than and designed to be the basis for a
SQL JDBC/ODBC driver
- JDBC/ODBC are typically not Zero-copy operations and there's data
reshaping because database and IPC and object structs differ unnecessarily
without Arrow
- https://github.com/BlazingDB
- BlazingSQL does GPU-accelerated CuDF w/ Dask, but from_arrow()
*converts* the pyarrow.Table to a cudf.DataFrame; which is not zero-copy
like zero_buffer
-
https://arrow.apache.org/datafusion/user-guide/faq.html#how-does-datafusion-compare-with-xyz
- DataFusion and Polars accelerate data operations by utilizing the
native SIMD support in many processors
- https://en.wikipedia.org/wiki/Single_instruction,_multiple_data
- https://github.com/simdjson/simdjson
- https://duckdb.org/faq.html#does-duckdb-use-simd :
> Does DuckDB use SIMD?
> DuckDB does not use explicit SIMD instructions because they
greatly complicate portability and compilation. Instead, DuckDB uses
implicit SIMD, where we go to great lengths to write our C++ code in such a
way that the compiler can auto-generate SIMD instructions for the specific
hardware. As an example why this is a good idea, porting DuckDB to the new
[ARM64-compatible] architecture took 10 minutes
On Mon, Jul 10, 2023, 9:50 PM Matthew Roeschke ***@***.***>
wrote:
> Closed #8545 <https://github.com/pandas-dev/pandas/issues/8545> as
> completed via #54060 <https://github.com/pandas-dev/pandas/pull/54060>.
>
> —
> Reply to this email directly, view it on GitHub
> <https://github.com/pandas-dev/pandas/issues/8545#event-9781600458>, or
> unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAAMNS7P3SLSVZZ7IPUKUWTXPSWPNANCNFSM4AVW2BTA>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
http://pandas.pydata.org/developers.html
Examples: