ocean-data-factory-sweden / kso

Notebooks to upload/download marine footage, connect to a citizen science project, train machine learning models and publish marine biological observations.
GNU General Public License v3.0
4 stars 12 forks source link

Endless download of sites_buv_doc.csv from AWS in Tutorial 8 of Spyfish Project #358

Closed pilarnavarro closed 4 months ago

pilarnavarro commented 4 months ago

I'm encountering an issue when running Tutorial 8 on the Spyfish Aotearoa project locally. The initialization of the ProjectProcessor doesn't complete:

pp = ProjectProcessor(project)

The connection to the server is established successfully and the movies file is downloaded without any issues. However, the execution hangs during the download of the sites file. The issue seems to originate from the download_file function of the boto3 client. When I interrupt the execution, it stops within this function:

However, when I run the same code in Google Colab, everything executes and completes without any problems. I noticed that the boto3 library versions specified in requirements_colab.txt and requirements.txt are different. I tried using the Google Colab version locally, but the problem persists.

Additionally, I attempted to set a timeout for data reading during the client creation as follows:

client = boto3.client(
    "s3",
    aws_access_key_id=aws_access_key_id,
    aws_secret_access_key=aws_secret_access_key,
    config=botocore.config.Config(read_timeout=10),
)

But this change did not resolve the issue.

jannesgg commented 4 months ago

@pilarnavarro This should be fixed now in the latest dev, as it seems there is some issue with threaded downloads.