pytorch / data

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.
BSD 3-Clause "New" or "Revised" License
1.08k stars 142 forks source link

S3FileLister: ValueError: curlCode: 77, Problem with the SSL CA cert (path? access rights?) #567

Closed MatthewCaseres closed 1 year ago

MatthewCaseres commented 2 years ago

πŸ› Describe the bug

The code that I am running is -

import torchdata.datapipes as dp

s3_urls = dp.iter.IterableWrapper(["s3://bucket/key"]).list_files_by_s3(request_timeout_ms=100)

print(next(iter(s3_urls)))

The full readout that I am seeing is here -

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_11947/2457993211.py in <cell line: 5>()
      3 s3_urls = dp.iter.IterableWrapper(["s3://bucket/key"]).list_files_by_s3(request_timeout_ms=100)
      4 
----> 5 print(next(iter(s3_urls)))

~/anaconda3/envs/pytorch_p39/lib/python3.9/site-packages/torch/utils/data/datapipes/_typing.py in wrap_generator(*args, **kwargs)
    512                         response = gen.send(None)
    513                 else:
--> 514                     response = gen.send(None)
    515 
    516                 while True:

~/anaconda3/envs/pytorch_p39/lib/python3.9/site-packages/torchdata/datapipes/iter/load/s3io.py in __iter__(self)
     56         for prefix in self.source_datapipe:
     57             while True:
---> 58                 urls = self.handler.list_files(prefix)
     59                 yield from urls
     60                 if not urls:

ValueError: curlCode: 77, Problem with the SSL CA cert (path? access rights?)
This exception is thrown by __iter__ of S3FileListerIterDataPipe(length=-1, source_datapipe=IterableWrapperIterDataPipe)

I can successfully run the following code -

import boto3

s3 = boto3.resource('s3')
object = s3.Object('bucket', 'key')

# Download the file from S3
object.download_file('./test.tfrecords')

Versions

Unsure if relevant but I am on an EC2 instance Deep Learning AMI.

PyTorch version: 1.12.0+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.6 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: version 3.22.3
Libc version: glibc-2.27

Python version: 3.9.13 | packaged by conda-forge | (main, May 27 2022, 16:56:21)  [GCC 10.3.0] (64-bit runtime)
Python platform: Linux-5.4.0-1080-aws-x86_64-with-glibc2.27
Is CUDA available: True
CUDA runtime version: 11.5.119
GPU models and configuration: GPU 0: Tesla T4
Nvidia driver version: 510.47.03
cuDNN version: Probably one of the following:
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.0.5
/usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn.so.8.1.1
/usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.1.1
/usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.1.1
/usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.1.1
/usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.1.1
/usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.1.1
/usr/local/cuda-11.2/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.1.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] mypy-boto3-s3==1.21.0
[pip3] mypy-boto3-sagemaker==1.21.0
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.22.4
[pip3] numpydoc==1.2.1
[pip3] torch==1.12.0
[pip3] torch-model-archiver==0.5.3b20220226
[pip3] torch-workflow-archiver==0.2.4b20220513
[pip3] torchaudio==0.11.0
[pip3] torchdata==0.4.0
[pip3] torchserve==0.5.3b20220226
[pip3] torchtext==0.12.0
[pip3] torchvision==0.12.0
[conda] blas                      2.115                       mkl    conda-forge
[conda] blas-devel                3.9.0            15_linux64_mkl    conda-forge
[conda] captum                    0.5.0                         0    pytorch
[conda] cudatoolkit               11.5.1              hcf5317a_10    conda-forge
[conda] libblas                   3.9.0            15_linux64_mkl    conda-forge
[conda] libcblas                  3.9.0            15_linux64_mkl    conda-forge
[conda] liblapack                 3.9.0            15_linux64_mkl    conda-forge
[conda] liblapacke                3.9.0            15_linux64_mkl    conda-forge
[conda] magma-cuda115             2.6.1                         0    pytorch
[conda] mkl                       2022.1.0           h84fe81f_915    conda-forge
[conda] mkl-devel                 2022.1.0           ha770c72_916    conda-forge
[conda] mkl-include               2022.1.0           h84fe81f_915    conda-forge
[conda] mkl-service               2.4.0            py39hb699420_0    conda-forge
[conda] mkl_fft                   1.3.1            py39h1fd5c3a_3    conda-forge
[conda] mkl_random                1.2.2            py39h8b66066_1    conda-forge
[conda] numpy                     1.22.4           py39hc58783e_0    conda-forge
[conda] numpydoc                  1.2.1              pyhd8ed1ab_0    conda-forge
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] torch                     1.12.0                   pypi_0    pypi
[conda] torch-model-archiver      0.5.3                    py39_0    pytorch
[conda] torch-workflow-archiver   0.2.4                    py39_0    pytorch
[conda] torchaudio                0.11.0               py39_cu115    pytorch
[conda] torchdata                 0.4.0                    pypi_0    pypi
[conda] torchserve                0.5.3                    py39_0    pytorch
[conda] torchtext                 0.12.0                     py39    pytorch
[conda] torchvision               0.12.0               py39_cu115    pytorch
VitalyFedyunin commented 1 year ago

@ydaiming please take a look

ydaiming commented 1 year ago

This SSL CA cert error

ValueError: curlCode: 77, Problem with the SSL CA cert (path? access rights?)

is very likely to be resolvable with the following command to provide the correct certificate at the desired directory:

mkdir -p /etc/pki/tls/certs && cp /etc/ssl/certs/ca-certificates.crt /etc/pki/tls/certs/ca-bundle.crt
MatthewCaseres commented 1 year ago

That resolves the SSL issue, thanks!

ejguan commented 1 year ago

@ydaiming

Thank you for providing the solution for users.

I'm curious why boto3 would work without certificate? It would be good in terms of UX if we can also provide some workaround for users without explicitly creating certificate at the desired directory.

MatthewCaseres commented 1 year ago

My experience so far is that the Pytorch Deep Learning AMI has friction when using torchdata, maybe work to make sure that this EC2 setup is working out of the box. I use EC2 instances as a development environment.

diggerk commented 1 year ago

It's really a deficiency of torchdata. It seems /etc/pki/tls/certs/ca-bundle.crt is the default location on RedHat, but on Debian/Ubuntu it's /etc/ssl/certs/ca-certificates.crt. As I understand, S3Handler.cpp should really configure the AWS client properly depending on the system it's running on, see a related discussion at https://github.com/aws/aws-sdk-cpp/issues/1863.

Shouldn't this be reopened, @ydaiming?

ejguan commented 1 year ago

@MatthewCaseres @diggerk Would this tutorial help you as we do have fsspec as an alternative path to load from S3? https://pytorch.org/data/beta/tutorial.html#working-with-cloud-storage-providers

Alyeko commented 1 year ago

This SSL CA cert error

ValueError: curlCode: 77, Problem with the SSL CA cert (path? access rights?)

is very likely to be resolvable with the following command to provide the correct certificate at the desired directory:

mkdir -p /etc/pki/tls/certs && cp /etc/ssl/certs/ca-certificates.crt /etc/pki/tls/certs/ca-bundle.crt

Providing the certificate in my notebook solved the 'ValueError: curlCode: 77' error I was having, but now I am having this Access Denied error seen below...

ValueError                                Traceback (most recent call last)
/tmp/ipykernel_28/311665444.py in <module>
----> 1 list(testdata_pipes)

/opt/conda/lib/python3.7/site-packages/torch/utils/data/datapipes/_hook_iterator.py in wrap_generator(*args, **kwargs)
    171                         response = gen.send(None)
    172                 else:
--> 173                     response = gen.send(None)
    174 
    175                 while True:

/opt/conda/lib/python3.7/site-packages/torchdata/datapipes/iter/load/s3io.py in __iter__(self)
     61         for prefix in self.source_datapipe:
     62             while True:
---> 63                 urls = self.handler.list_files(prefix)
     64                 yield from urls
     65                 if not urls:

ValueError: Access Denied
This exception is thrown by __iter__ of S3FileListerIterDataPipe() 

I am using a kaggle notebook and trying to obtain images from an S3 bucket (s3://drivendata-competition-biomassters-public-us)

I get the kaggle notebook running by importing my packages and using the following codes:

>> !unzip awscliv2.zip
>> !./aws/install

Then configure my access keys by doing the following: >> !aws configure set aws_access_key_id 'XXXXXXXXXXXXXXXXXXXX' >> !aws configure set aws_secret_access_key 'XXXXXXXXXXXXXXXXXX'

>> !aws s3 ls s3://drivendata-competition-biomassters-public-us --no-sign-request

By doing all this I am able to see the folders and files in the S3 bucket, however I cannot use the S3filelister, when I run the code below.

>> s3_prefixes = IterableWrapper(['s3://drivendata-competition-biomassters-public-us/train_agbm/0003d2eb_agbm.tif',
                               's3://drivendata-competition-biomassters-public-us/train_agbm/000aa810_agbm.tif',
                               's3://drivendata-competition-biomassters-public-us/train_agbm/000d7e33_agbm.tif'])

>> dp_s3_urls = S3FileLister(s3_prefixes)
>> list(dp_s3_urls)

Any help would be appreciated, thanks!

ejguan commented 1 year ago

@Alyeko You might need to provide a region to S3FileLister. You can find your region under .aws/credentials

Alyeko commented 1 year ago

@ejguan, thanks, I have provided my region but it does not work.

Also, I realize that when I run !aws configure list, in my notebook , my profile value is <not set> . I try to configure my profile by running !aws configure set profile MY-IAM-PROFILE-NAME, but it does not work.

I also obtained my credentials.csv but it does not have the User Name option so, I manually add my IAM-USERNAME and used !aws configure import --csv '/path-to-credentisls.csv', but also it does not work and it still shows my profile value is <not set>.

Any solution? Thanks!

ejguan commented 1 year ago

@Alyeko In that case, have to consult from AWS team cc: @ydaiming

BTW, what would the result if using FSSpecFileLister? You might need to install s3fs and fsspec to access S3 bucket.

I need to narrow down if this is the problem of aws sdk or configuration.

ydaiming commented 1 year ago

@Alyeko I'm sorry to hear about the difficulty. Just to clarify, you can e.g. !export AWS_REGION=us-west-2 to set the region, or provide the same alias through the Python function, as described here.

Access denied is a big word, but it means the AWS S3 service is reached. The issue is most likely due to credential configuration, as you're tracking, and most likely due to the wrong region configuration. I see that you've used this argument --no-sign-request which implies a public bucket? In that sense, any proper credential configuration should work without issues. I personally didn't encounter your case, and sorry for not being more helpful in this case.

Edit: Could you let me know which region is this dataset in? I may try it on my own.

cfregly commented 1 year ago

fwi, i had the same issue within SageMaker Studio. To solve this, I had to run the same command here within the Studio Notebook itself.

Alyeko commented 1 year ago

@Alyeko I'm sorry to hear about the difficulty. Just to clarify, you can e.g. !export AWS_REGION=us-west-2 to set the region, or provide the same alias through the Python function, as described here.

Access denied is a big word, but it means the AWS S3 service is reached. The issue is most likely due to credential configuration, as you're tracking, and most likely due to the wrong region configuration. I see that you've used this argument --no-sign-request which implies a public bucket? In that sense, any proper credential configuration should work without issues. I personally didn't encounter your case, and sorry for not being more helpful in this case.

Edit: Could you let me know which region is this dataset in? I may try it on my own.

Thanks! Yes it is a public bucket, in the US East AWS Region. If I do not add the --no-sign-request , i get an error (An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied).

Extra information: In my AWS management console, I have created 1 user group which has 1 user. I created access key and secret access key for the user and it is these keys that I use to configure my notebook to access the data. My group has the AmazonS3FullAccess policy activated.

Alyeko commented 1 year ago

@Alyeko In that case, have to consult from AWS team cc: @ydaiming

BTW, what would the result if using FSSpecFileLister? You might need to install s3fs and fsspec to access S3 bucket.

I need to narrow down if this is the problem of aws sdk or configuration.

Thanks but FSSpeccFileLister did not work for me :( as I got the error...

Unable to locate credentials This exception is thrown by __iter__ of FSSpecFileListerIterDataPipe()