uber / petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Apache License 2.0
1.78k stars 285 forks source link

Security fix for arbitrary code execution. #640

Closed selitvin closed 3 years ago

selitvin commented 3 years ago

Based on #637: whitelists a set of packages which classes can be unpickled. Prevents unpickling a malicious class that may invoke os.execute or a similar other malicious function calls.

codecov[bot] commented 3 years ago

Codecov Report

Merging #640 (9226ba5) into master (20e46e0) will decrease coverage by 0.01%. The diff coverage is 81.81%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #640      +/-   ##
==========================================
- Coverage   86.18%   86.16%   -0.02%     
==========================================
  Files          84       84              
  Lines        5051     5061      +10     
  Branches      788      789       +1     
==========================================
+ Hits         4353     4361       +8     
- Misses        559      560       +1     
- Partials      139      140       +1     
Impacted Files Coverage Δ
petastorm/etl/legacy.py 84.00% <81.81%> (-2.67%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 20e46e0...9226ba5. Read the comment docs.