Open frosk1 opened 2 years ago
There is a workaround already possible:
location of ssh key for remote repo
/home/user/.ssh/.
requirements.txt
sshrepo @ git+ssh://git@bitbucket.org/exampleproject/sshrepo.git
driver script
import ray
@ray.remote(runtime_env={"pip": "./requirements.txt", "eager_install": False})
def get_version():
from powerplant.interfaces.data import DataInterface
print(dir(DataInterface))
if __name__ == '__main__':
ray.init(address="auto", )
ray.get(get_version.remote())
To be honest I am not sure if this is actually a security issue, that pip from ray started python env has access to default ssh keys stored on the machine where ray head node is started.
In terms of code injection attacks.
any comments on that someone? @clarkzinzow
cc @architkulkarni who is on-call for the platform team
Hi @frosk1, thanks for posting the workaround. The original feature request makes sense, unfortunately we haven't been able to prioritize adding the feature yet. Were you looking for a comment on the security issue you raised? Could you give some more details about that?
That is fine @architkulkarni .
I am not quite sure how to handle public available ray cluster solutions actually (authentication).
If you managed to have access to a running ray cluster, at the moment it seems pretty easy to escape the cluster environment and access the underlying machine.
Having the runtime_environments features that automatically give access to underlying pip, which has access to default ssh keys
or using working_dir, which has write access to the underlying machine, lets anyone easily inject what ever they want.
A ray cluster so to say should only be running on infrastructure that has no access to any other service / infrastructure. If this would be the case, an attacker could easily exploit this access from the underlying machine by using ray API.
Does this makes sense?
Thanks for the details! Let me pull in @ijrsvt who knows more about security. @ijrsvt any comments about these concerns? The relevant context from runtime_env is that runtime env calls the system's pip install
on the remote cluster, and also writes working_dir
to the remote cluster's filesystem.
Hey @architkulkarni and @frosk1 , Ray currently doesn't include any security isolation or sandboxing. Someone with access to a Ray cluster (whether via Ray client or via task submission) will likely have full access to the underlying machines.
Ray is more or less a system that enables 'arbitrary code execution', so we suggest focusing one level up--preventing un-authorized access to Ray clusters. Similarly, isolation between different people on one Ray cluster is more for usability, not security (i.e. person X wants a different set of packages than person Y).
Description
At the moment there is no way of installing private git repositories within the runtime_env logic.
Most of private git dependencies are using ssh keys for installation. This is not supported and hence, users can not use private git repositories within the runtime_env logic when they need an ssh key to authenticate for installing them.
example of a high level API call for that:
runtime_env={"pip": ["lightgbm"], "ssh": True}
Several issues need to be taken care of:
Use case
A company or project might has dependencies living in private remote git repositories like Github,Gitlab or Bitbucket. They using ssh keys for authentication within the installation process.
These companies/projects right now can not use there dependencies on the ray cluster if they using the runtime_env mechianics to install dependencies.