pangeo-data / pangeo-eosc

Pangeo for the European Open Science cloud
https://pangeo-data.github.io/pangeo-eosc/
MIT License
3 stars 3 forks source link

Is it possible to have limited authorisation on Datasets uploaded to Swift/S3 on CESNET? #17

Open guillaumeeb opened 2 years ago

guillaumeeb commented 2 years ago

Question by @tinaok:

Hi, I have a test pangeo config to run, but I need to make some part of data available only for a few user. Is it possible to create Zarr file on Cesnet's swift 'private' (the owner) 'group' (some of people who have access to pangeo-eosc platform) and 'public' internet open ?

@tinaok could you precise a bit your need?

tinaok commented 2 years ago

Thank you @guillaumeeb

  • Dataset should be only visible to some users? yes.
  • Dataset should be only writable by some users, but can be viewed and read by anyone? no.
  • What do you mean between owner and group?

In case of linux system, in HPC center, we create unix group and add some users, who want to share share data, in that unix group. And there we control it with chmod g+r o-r toto.nc In our case, other, is internet. group, can be the all the people have EGI autherised access for Pangeo cloud, or Pangeo cloud admin group. My question is how do we create a bucket only accessible with these group of people but not from internet, with our EGI authentification system.

guillaumeeb commented 2 years ago

OK, so I think we'll need @sebastian-luna-valero's help on this one, and probably some of CESNET staff also. I can still try to answer some points.

There is no such thing of user:group concept in Cloud and object store, things are different. You've got user accounts (EGI here), projects or tenants (Pangeo VO I guess), and you can usually define roles and policies with all that. These policies are kind of ACL (Access Control List): they define who can perform which operation on a Project or on a Bucket/Container. I'm not sure how this is implemented in CESNET, but there I checked in the doc that it is possible to use something like this on Openstack.

By default, with Horizon interface or during bucket/container creation, we can only specify is a container is public (visible on internet) or not. So the situation is as below I think:

Be careful: if you create an S3 Access/Secret keys pair, and give it to another person, it will be by default a admin keys pair.

So to know if we can set more precise rules, we'll need help from other people to know which Openstack command we could type, and if this is compatible with S3 or only Swift credentials.

sebastian-luna-valero commented 2 years ago

Hello,

Here is the current situation:

Please note that currently:

If we need something intermediate, we will need to explore options in: https://docs.openstack.org/swift/latest/overview_acl.html

Please let me know your thoughts.

Best regards, Sebastian

tinaok commented 2 years ago

Hi Sebastian, The use case I have in mind requires 'something intermediate'.
I'll have some users who does not require OpenStack dashboard access. But requires DaskHub, and requires 'private' buckets only for these users. It is ok that Pangeo VO admins access to these datas as they are admins.

tinaok commented 2 years ago

I have related questions to @sebastian-luna-valero. If we use s3 access through MinIO server proposed at IM Dashboard, do we have different type of user groups? Or as it will be anyway backed up with EGI check-in for user control, it is same as using openstack object storage directly from CESNET?

sebastian-luna-valero commented 2 years ago

Hi,

To address this issue I have opened: https://github.com/pangeo-data/pangeo-eosc/pull/23

Here is the status after merging that PR:

  1. Who can create/destroy VMs in the cloud (e.g. to deploy DaskHub)? Members of the pangeo.admins VO group in aai.egi.eu
  2. Who has access to DaskHub? Members of the vo.pangeo.eu VO in aai-dev.egi.eu. Ideally we want this to be moved to aai.egi.eu as well.
  3. Who has read/write access to object storage? Members of the vo.pangeo.eu VO in aai.egi.eu

Now, following instructions to configure awscli users that want private buckets should be able to do that using --acl private with aws s3 commands.

All of the above should address the comments from @tinaok

The use case I have in mind requires 'something intermediate'. I'll have some users who does not require OpenStack dashboard access. But requires DaskHub, and requires 'private' buckets only for these users. It is ok that Pangeo VO admins access to these datas as they are admins.

Regarding the question about MinIO. If you deploy it with IM Dashboard you have full control over it (i.e. you can decide to configure EGI Check-In or any other user accounts/groups). However, please bear in mind that it's not only about deploying and configuring MinIO, it will be also another service to be maintained by us. Therefore, I would leave this as last resort, and use the object storage at CESNET that is already managed.

sebastian-luna-valero commented 2 years ago

xref: https://docs.aws.amazon.com/cli/latest/reference/s3api/put-object-acl.html

guillaumeeb commented 2 years ago

Now, following instructions to configure awscli users that want private buckets should be able to do that using --acl private with aws s3 commands.

So what you are saying here, is that once we've setup our AWS S3 credentials, we can use aws s3 commands, following https://docs.aws.amazon.com/cli/latest/reference/s3api/put-object-acl.html, to position specific ACLs on any storage bucket/container?

I'll try that later on this week or the next.

However, please bear in mind that it's not only about deploying and configuring MinIO, it will be also another service to be maintained by us. Therefore, I would leave this as last resort, and use the object storage at CESNET that is already managed.

:+1: about this, handling our own object store would certainly be some work. And we'll also probably run into performance concerns.

sebastian-luna-valero commented 2 years ago

So what you are saying here, is that once we've setup our AWS S3 credentials, we can use aws s3 commands, following https://docs.aws.amazon.com/cli/latest/reference/s3api/put-object-acl.html, to position specific ACLs on any storage bucket/container?

I have only tested the --acl private option. Being OpenStack Swift underneath I am not sure whether all the AWS S3 options will be supported. Please test and let us know.

guillaumeeb commented 2 years ago

Could you just clarify a bit how you see the storage permissions using S3 interface after #23, so with containers/buckets created in another Openstack project?

tinaok commented 1 year ago

following https://github.com/pangeo-data/pangeo-eosc/issues/39#issuecomment-1277671176

What shall we tell students to do to avoid that one student delete another student's data ?

All students, I'll add them in member of vo.pangeo.eu in aai.eu , so that they can read/write in private bucket that I'll create for each working group.

But if I understood right, unlike HPC centres, that if one user make Zarr file, other user, they can delete this Zarr file by mistake?

Until we find solutions, I'll explain them to 'check the path' so do not touch other's file, but if we can find better solution it would be nice. I wonder how Pangeo US cloud are dealing with this....

sebastian-luna-valero commented 1 year ago

Hi,

The problem is with the translation of the federated identity from Check-In into the local identity at CESNET. This issue is very specific to the federated AAI infrastructure that we are using for this deployment. If other deployments use other authentication/authorization methods, they won't have the same issue.

Indeed, the recommendation until the issue is solved is to be careful with the path. As long as everybody writes on their own bucket/path, everything should be fine. Maybe they can use their own user ID as a prefix? Hopefully that's unique to everyone.

Apologies, CESNET has been looking into the issue, but it's not an easy one to solve.

sebastian-luna-valero commented 1 year ago

I believe this has been fixed with MinIO. Do you want to test or should we directly close this?

tinaok commented 1 year ago

Thank you @sebastian-luna-valero, yes I would like to test it to understand the procedure, which documentation I should follow? Thank you for your help.

sebastian-luna-valero commented 1 year ago

Hi @tinaok

This is the starting point: https://github.com/pangeo-data/pangeo-eosc/blob/main/users/users-getting-started.md#access-minio

Please give it a go and let us know how it goes.

Best regards, Sebastian

tinaok commented 1 year ago

Thank you @sebastian-luna-valero, I couldn't create a bucket, may be because I'm not connected as administrator?
Tina

sebastian-luna-valero commented 1 year ago

Could you try following these steps?

https://github.com/pangeo-data/pangeo-eosc/blob/main/users/how-to/TestMinIO.ipynb

I think we should link the example from the getting started guide: https://github.com/pangeo-data/pangeo-eosc/pull/56