rapidsai / deployment

RAPIDS Deployment Documentation
https://docs.rapids.ai/deployment/stable/
9 stars 28 forks source link

fix cuspatial.haversine_distance() example in multi-tenant notebook #336

Closed jameslamb closed 7 months ago

jameslamb commented 7 months ago

Fixes the cuspatial.haversine_distance() example using the public NYC taxi data hosted on GSC (GSC console link).

Running through the notebook code for that example (the same one that ends up at https://docs.rapids.ai/deployment/nightly/examples/rapids-autoscaling-multi-tenant-kubernetes/notebook/), I encountered 3 issues:

  1. needed to authenticate with GCP
ValueError: An error occurred while calling the read_parquet method registered to the cudf backend.
Original Message: An error occurred while calling the read_parquet method registered to the pandas backend.
Original Message: Invalid gcloud credentials
  1. that GCS bucket contains some files at path gcs://anaconda-public-data/nyc-taxi/2015.parquet that are not actually parquet files
Screenshot 2024-02-05 at 1 20 07 PM
  1. cuspatial.haversine_distance() expects to receive 2 cuspatial.GeoSeries objects
TypeError('haversine_distance() takes 2 positional arguments but 4 were given')

Looks like that changed here: https://github.com/rapidsai/cuspatial/pull/924.

This resolves those issues.

How I tested this

Following the instructions from https://docs.rapids.ai/install#install-rapids, ran jupyter lab in a RAPIDS container like this:

docker run \
    --gpus all \
    --pull always \
    --rm \
    -it \
    --shm-size=1g \
    --ulimit memlock=-1 \
    --ulimit stack=67108864 \
    -p 8888:8888 \
    -p 8787:8787 \
    -p 8786:8786 \
    rapidsai/notebooks:24.02a-cuda12.0-py3.10

Then ran this notebook code (just the LocalCUDACluster parts and below), on a machine with a few 80GB H100s. Confirmed that data was pulled successfully without needing to authenticate with GCP, and that cuspatial.haversine_distance() ran without error and produced plausible-looking results.

review-notebook-app[bot] commented 7 months ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB