scipp / scippneutron

Neutron scattering toolkit built using scipp for Data Reduction. Not facility or instrument specific.
https://scipp.github.io/scippneutron/
BSD 3-Clause "New" or "Revised" License
4 stars 3 forks source link

Setup Azure Blobs for tests with large files #476

Closed YooSunYoung closed 3 months ago

YooSunYoung commented 11 months ago

~To Do: Can we use authentication via SciCat instead?~ -> Not fit for purpose, e.g., since files cannot be changed or removed

(Johannes) At some point we discussed putting the files in login.esss.dk:/mnt/groupdata/scipp/testdata/ instead and transferring them to the github action runner over ssh. I don't remember, why did we decide against that in the end?

YooSunYoung commented 11 months ago

I was trying to use azure client locally. It'll look like this in the workflow.

jobs:
  download:
    runs-on: ubuntu-latest
    steps:
      - name: Azure login
        uses: azure/login@v1
        with:
          client-id: ${{ secrets.AZURE_CLIENT_ID }}
          tenant-id: ${{ secrets.AZURE_TENANT_ID }}
          subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}

      - name: Azure Blob Download
        uses: azure/CLI@v1
        with:
          azcliversion: latest
          inlineScript: |
            az account show
            az storage blob download \
              --account-name scipp \
              --account-key ${{ secrets.AZURE_ACCOUNT_KEY}} \
              --container-name nmx \
              --file pulse5_z_subset.h5 \
              --name pulse5_z_subset.h5

I couldn't try it in the CI since I don't have access to the github secrets... Can someone with owner role can set up the secrets...?

nvaytet commented 6 months ago

Do we still want to proceed with this or do we want to set up the password protected folder in our local http server? @jl-wynen @YooSunYoung

jl-wynen commented 6 months ago

The password protected folder seems like the simplest solution at this point. We could always move to Azure later if the folder does not work out for us.

nvaytet commented 6 months ago

Do we have this set-up in a more permanent manner now?

jl-wynen commented 6 months ago

No. We need to tell Brian to do it.

jokasimr commented 4 months ago

We could also use Nextcloud for this. Either by having a folder with a sharing link, or having sharing links to the individual files in the folder.

Only issue is that it would be owned by one of us. All could still edit it. However I don't that would be a big issue since folder ownership can be transferred between users.

Example: wget --user "ZTjZ6oTtS55LXET" --password "ZbHgR6qtmH" https://project.esss.dk/nextcloud/public.php/webdav/share-test.md

(note that the "user" here is a reference to a shared folder and the "password" is a link protection password, not my user credentials)

nvaytet commented 4 months ago

We would need to authenticate from github CI runners, by giving someone's credentials, so I don't really see it happening.

Also, Nextcloud is very slow for me, and DST says it is not a place where we should store data files (even if this is what many people are doing). So I would vote against nextcloud...

I don't see the advantage over the server we already have, which can apparently have password protected folders.

jokasimr commented 4 months ago

We would need to authenticate from github CI runners, by giving someone's credentials

No we can authenticate using a share link with a password. The share link or the password can be stored in github secrets

See the example in the comment above.

The only advantage is that we don't have to involve dst.

and DST says it is not a place where we should store data files

Oh I didn't know that. Thought that was the point of it 😅

jokasimr commented 4 months ago

Spoke to Brian about it and now we have password protected folders enabled. Here's a test folder: https://public.esss.dk/groups/scipp/test-access/test.txt

Brian mentioned this setting will not be properly activated before tomorrow. So it might stop working in an hour or so.

nvaytet commented 4 months ago

@jl-wynen I don't remember: did you make sure this kind of folder works with pooch? If so, how do we send the password through?

jokasimr commented 4 months ago

Haven't tested yet but I assume pooch supports basic http authentication

jokasimr commented 4 months ago

Waiting for the functionality to be rolled out by DST (I have reminded them about it).

jokasimr commented 3 months ago

With https://github.com/scipp/copier_template/pull/192#event-13438180295 merged we can now use protected files in the workflows.

The protected files have to be placed below https://public.esss.dk/groups/scipp/protected/ on the the file server. The github actions using protected files will need to look for credentials in the environment variables, the names ESS_PROTECTED_FILESTORE_USERNAME respectively ESS_PROTECTED_FILESTORE_PASSWORD.

To fetch files we need to do:

p.fetch(
    filename,
    downloader=pooch.HTTPDownloader(
        headers={
            'Authorization': 'Basic ' + str(base64.b64encode(f'{username}:{password}'.encode('utf-8')), 'utf-8')
        }
    )
)

If the password contains only latin1 characters its sufficient to do pooch.fetch(..., auth=(username, password)), but not otherwise, see https://github.com/psf/requests/issues/4564. I can of course not tell you if our passwords contain non-latin1 characters ;), but it is probably safest to assume that they might.

jokasimr commented 3 months ago

@jl-wynen you reviewed the mentioned PR. Do you think this is sufficient to close the issue or is there anything else that should be added?

jokasimr commented 3 months ago

~For some reason the github organization secrets storing the credentials are not accessible in the workflows. Not sure why. To investigate this I'm using a test repository https://github.com/scipp/test-secret-files ~

Fixed. The issue was that secrets are not inherited by called workflows by default.

jokasimr commented 3 months ago

I've verified this works by making a test PR in the ess-reflectometry workflow (now closed): https://github.com/scipp/essreflectometry/pull/63

Only remaining issue is: https://github.com/scipp/copier_template/pull/192#discussion_r1671533053 Do we want to solve that before closing this, or pospone to a different issue that can be addressed when we have come up with a solution?

(I'm leaning towards closing this now, but if others have different opinions I wont)

SimonHeybrock commented 3 months ago

Thanks for setting this up!