rapidsai / clx

A collection of RAPIDS examples for security analysts, data scientists, and engineers to quickly get started applying RAPIDS and GPU acceleration to real-world cybersecurity use cases.
Apache License 2.0
168 stars 68 forks source link

Update Rapids datasets download URL #527

Closed jjacobelli closed 1 year ago

jjacobelli commented 1 year ago

Update Rapids datasets download URL to reduce latency and costs. This PR also replace the usage of s3fs by requests to get Rapids datasets as we are not using an S3 URL anymore

review-notebook-app[bot] commented 1 year ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

ajschmidt8 commented 1 year ago

To provide additional context...

data.rapids.ai serves the contents of the rapidsai-data S3 bucket via an AWS CloudFront distribution.

The benefits of using a CloudFront distribution are:

Therefore, it's in everyone's best interest to start using the new data.rapids.ai URLs for downloading datasets.

At some point in the future, the S3 URLs will be disabled and datasets will only be retrievable from data.rapids.ai.

ajschmidt8 commented 1 year ago

/merge

ajschmidt8 commented 1 year ago

Since clx is scheduled to be deprecated soon, I will admin merge this PR despite the CI failures (which are unrelated to these changes).

I don't want this repository to be archived with the S3 URLs.