pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.8k stars 17.98k forks source link

SEC: Add security disclosure process to developers page #8545

Closed westurner closed 1 year ago

westurner commented 10 years ago

http://pandas.pydata.org/developers.html

Examples:

jreback commented 10 years ago

pls send me a private email and I'll take a look git log will show my email addess

westurner commented 10 years ago

Resolved. Looks like I needed to update to the latest pandas release. Thanks!

westurner commented 10 years ago

Going forward, for your project, it would be good to have a documented process for fielding security issues.

jreback commented 10 years ago

@westurner I get that you want to raise these types of issues. But not sure that this is a pandas issue at all. It may be that the 'use' of pandas is incorrect, so possibly a doc note is in order as pandas is not directly web-facing.

westurner commented 10 years ago

It may be a healthy objective for the pandas project.

Examples:

...

jreback commented 10 years ago

I get all this, but what can pandas actually do about this?

jreback commented 10 years ago

ahh, you want to make this a doc issue, ok with that.

jreback commented 10 years ago

@westurner ok pull-request for 0.15.1 then!

westurner commented 10 years ago

Document what process for documenting issues and resolution are optimal in a security sensitive context. (e.g. link to a mailing list, or whatever you feel is appropriate)

westurner commented 7 years ago

Here's a good example: https://docs.djangoproject.com/en/dev/internals/security/#reporting-security-issues

westurner commented 7 years ago

https://securitytxt.org/ recommends /.well-known/security.txt.

https://tools.ietf.org/html/draft-foudil-securitytxt-00

https://tools.ietf.org/html/draft-foudil-securitytxt

gfyoung commented 7 years ago

@westurner : Seems reasonable. You're more than welcome to open a PR to add this!

mroeschke commented 3 years ago

This is already added in https://github.com/pandas-dev/pandas/blob/master/.github/SECURITY.md so I think we can close this issue

westurner commented 3 years ago

👍 Duplicate of https://github.com/pandas-dev/pandas/issues/27821

westurner commented 3 years ago

Actually, this still isn't on the docs?

Maybe;

westurner commented 3 years ago

Or would that be unhelpful because the Sphinx docs are in RST instead of the - newer - MyST Markdown?

mroeschke commented 3 years ago

Ah good point @westurner, this is not explicitly called out in the docs. Might be good to add a section in https://pandas.pydata.org/docs/development/policies.html with the security policy. I'll reopen this

westurner commented 1 year ago

Thanks.

From https://github.com/pandas-dev/pandas/security/policy 2023-07 :

To report a security vulnerability to pandas, please go to https://tidelift.com/security and see the instructions there

https://github.com/pandas-dev/pandas/security/advisories lists zero security advisories. Will need to check out how that works; does it feed from OSV?

From https://osv.dev/ :

Data sources This infrastructure serves as an aggregator of vulnerability databases that have adopted the OSV schema, including GitHub Security Advisories, PyPA, RustSec, and Global Security Database, and more. [...] OSV schema All advisories in this database use the OpenSSF OSV format, which was developed in collaboration with open source communities.

The OSV schema provides a human and machine readable data format to describe vulnerabilities in a way that precisely maps to open source package versions or commit hashes.

curl -d \
  '{"version": "0.0.0",
    "package": {"name": "pandas", "ecosystem": "PyPI"}}' \
  "https://api.osv.dev/v1/query"

GitHub Advisory Database > Sources https://github.com/github/advisory-database#sources :

From https://github.com/pypa/advisory-database#readme :


Python Packaging Advisory Database

This is community owned repository of advisories for packages published on https://pypi.org.

Advisories live in the vulns directory and use a YAML encoding of a simple format.

Contributing advisories

Making a pull request

Existing entries can be edited by simply creating a pull request.

To introduce a new entry, create a pull request with a new file that has a name matching PYSEC-0000-<anything>.yaml. This will be later picked up by automation to allocate a proper ID once merged.

Triage process

Much of the existing set of vulnerabilities are collected from the NVD CVE feed.

We use this tool, which performs a lot of heuristics to match CVEs with exact Python packages and versions (which is a difficult problem!) and a small amount of human triage to generate the .yaml entries here.

Using this data

Vulnerabilities are integrated into the Open Source Vulnerabilities project, which provides an API to query for vulnerabilities like so:

$ curl -X POST -d \
          '{"version": "2.4.1", "package": {"name": "jinja2", "ecosystem":
"PyPI"}}' \
          "https://api.osv.dev/v1/query"

Longer term, we are working with the PyPI team to build a pipeline to automatically get these vulnerabilities into PyPI. The goal is to have the pip install (and an additional pip audit) command automatically report vulnerabilities out of the box.



*****

-

- https://www.google.com/search?q=CVE-2020-13091
  - pickle vuln in pandas<=1.0.3 due to upstream cpython/python#pickle vuln
    - pickle `eval()`s data/**code** and `exec()`s the `__reduce__()`
method, and there's (still?) not (yet?) a pickle protocol to prevent exec
on read

- SQLi: SQL Injection
  Perhaps obviously, if you prepare unsafe SQL queries - for example
without use query parameterization;;-- string concatenation - and run them
on a SQL database (with pandas (SQLalchemy) or any other library in any
programming language) there would be SQLi (SQL Injection) vulnerabilities
in your app which depends upon pandas.

  - ENH: sql support with SQLAlchemy
    https://github.com/pandas-dev/pandas/issues/6292#issuecomment-49088480
(2014)
    -
https://github.com/pandas-dev/pandas/blob/main/pandas/tests/io/test_sql.py
    - https://pandas.pydata.org/docs/user_guide/io.html#sql-queries

*****

    -
https://pandas.pydata.org/docs/user_guide/io.html#general-parsing-configuration
`dtype_backend="pyarrow"`

    - https://arrow.apache.org/blog/2022/02/16/introducing-arrow-flight-sql/
      - Arrow Flight SQL is faster than and designed to be the basis for a
SQL JDBC/ODBC driver
        - JDBC/ODBC are typically not Zero-copy operations and there's data
reshaping because database and IPC and object structs differ unnecessarily
without Arrow
    - https://github.com/BlazingDB
      - BlazingSQL does GPU-accelerated CuDF w/ Dask, but from_arrow()
*converts* the pyarrow.Table to a cudf.DataFrame; which is not zero-copy
like zero_buffer
    -
https://arrow.apache.org/datafusion/user-guide/faq.html#how-does-datafusion-compare-with-xyz
      - DataFusion and Polars accelerate data operations by utilizing the
native SIMD support in many processors
      - https://en.wikipedia.org/wiki/Single_instruction,_multiple_data
      - https://github.com/simdjson/simdjson
      - https://duckdb.org/faq.html#does-duckdb-use-simd :
        > Does DuckDB use SIMD?
        > DuckDB does not use explicit SIMD instructions because they
greatly complicate portability and compilation. Instead, DuckDB uses
implicit SIMD, where we go to great lengths to write our C++ code in such a
way that the compiler can auto-generate SIMD instructions for the specific
hardware. As an example why this is a good idea, porting DuckDB to the new
[ARM64-compatible] architecture took 10 minutes

On Mon, Jul 10, 2023, 9:50 PM Matthew Roeschke ***@***.***>
wrote:

> Closed #8545 <https://github.com/pandas-dev/pandas/issues/8545> as
> completed via #54060 <https://github.com/pandas-dev/pandas/pull/54060>.
>
> —
> Reply to this email directly, view it on GitHub
> <https://github.com/pandas-dev/pandas/issues/8545#event-9781600458>, or
> unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAAMNS7P3SLSVZZ7IPUKUWTXPSWPNANCNFSM4AVW2BTA>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>