rapidsai / cuspatial

CUDA-accelerated GIS and spatiotemporal algorithms
https://docs.rapids.ai/api/cuspatial/stable/
Apache License 2.0
596 stars 150 forks source link

[BUG]: possible performance regression in points_in_polygon() #1413

Closed jameslamb closed 1 month ago

jameslamb commented 1 month ago

Version

24.08

On which installation method(s) does this occur?

Conda

Describe the issue

See the write-up at https://github.com/rapidsai/cuspatial/pull/1407#issuecomment-2234181801.

Since around July 12, 2024, the nyc_taxi_years_correlation.ipynb started taking several hours to complete (on v24.08, using 24.08 cudf and other RAPIDS nightlies). Prior to that, on the exact same hardware, it completed in under 8 minutes.

I was able to reproduce this interactively, on a machine with 8 V100s and CUDA 12.2.

I strongly suspect that this indicates a performance regression, maybe of the form "some change(s) in cudf cause a cuspatial codepath that could previously execute on the GPU to fall back to the CPU", although I don't have profiling output to provide as evidence.

Minimum reproducible example

From https://github.com/rapidsai/cuspatial/pull/1407#issuecomment-2234181801.

Download the input data.

if [ ! -f "tzones_lonlat.json" ]; then
    curl "https://data.cityofnewyork.us/api/geospatial/d3c5-ddgc?method=export&format=GeoJSON" -o tzones_lonlat.json;
else
    echo "tzones_lonlat.json found";
fi
if [ ! -f "taxi2016.csv" ]; then
    curl https://storage.googleapis.com/anaconda-public-data/nyc-taxi/csv/2016/yellow_tripdata_2016-01.csv -o taxi2016.csv;
else
    echo "taxi2016.csv found";
fi   

Then, in a Python 3.11 session (with v24.08 of cuspatial and all its RAPIDS dependencies).

import cuspatial
import geopandas as gpd
import cudf
import numpy as np

taxi2016 = cudf.read_csv("taxi2016.csv")
tzones = gpd.GeoDataFrame.from_file('tzones_lonlat.json')
taxi_zones = cuspatial.from_geopandas(tzones).geometry
taxi_zone_rings = cuspatial.GeoSeries.from_polygons_xy(
    taxi_zones.polygons.xy,
    taxi_zones.polygons.ring_offset,
    taxi_zones.polygons.part_offset,
    cudf.Series(range(len(taxi_zones.polygons.part_offset)))
)

def make_geoseries_from_lonlat(lon, lat):
    lonlat = cudf.DataFrame({"lon": lon, "lat": lat}).interleave_columns()
    return cuspatial.GeoSeries.from_points_xy(lonlat)

pickup2016 = make_geoseries_from_lonlat(taxi2016['pickup_longitude'] , taxi2016['pickup_latitude'])
dropoff2016 = make_geoseries_from_lonlat(taxi2016['dropoff_longitude'] , taxi2016['dropoff_latitude'])

pip_iterations = list(np.arange(0, 263, 31))
pip_iterations.append(263)
print(pip_iterations)

taxi2016['PULocationID'] = 264
taxi2016['DOLocationID'] = 264

start = pip_iterations[0]
end = pip_iterations[1]

zone = taxi_zone_rings[start:end]

# find all pickups in that zone
pickups = cuspatial.point_in_polygon(pickup2016, zone)
print(pickups)
print("---")
dropoffs = cuspatial.point_in_polygon(dropoff2016, zone)
print(dropoffs)

That one combination of polygons completed successfully, but took 21 to complete. It's the 2 points_in_polygon() calls that took around 20 of those 21 minutes.

And in the notebook, 10 such combinations are processed.

https://github.com/rapidsai/cuspatial/blob/c8616c1534f2555423186096cce156e6fd104ef7/notebooks/nyc_taxi_years_correlation.ipynb#L168-L169

[0, 31, 62, 93, 124, 155, 186, 217, 248, 263]

https://github.com/rapidsai/cuspatial/blob/c8616c1534f2555423186096cce156e6fd104ef7/notebooks/nyc_taxi_years_correlation.ipynb#L207-L209

So conservatively, it might take 3.5 hours for the notebook to finish in my setup. And that's making a LOT of assumptions.

Relevant log output

N/A

Environment details

Both these environments:

Using cudf (and other RAPIDS dependencies) nightly conda packages as of July 12, 2024.

output of 'conda info', 'conda env export', and 'nvidia-smi' (click me) ```text active environment : test active env location : /opt/conda/envs/test shell level : 1 user config file : /github/home/.condarc populated config files : /opt/conda/.condarc conda version : 24.5.0 conda-build version : 24.5.1 python version : 3.11.9.final.0 solver : libmamba (default) virtual packages : __archspec=1=broadwell __conda=24.5.0=0 __cuda=12.4=0 __glibc=2.35=0 __linux=5.4.0=0 __unix=0=0 base environment : /opt/conda (writable) conda av data dir : /opt/conda/etc/conda conda av metadata url : None channel URLs : https://conda.anaconda.org/rapidsai/linux-64 https://conda.anaconda.org/rapidsai/noarch https://conda.anaconda.org/rapidsai-nightly/linux-64 https://conda.anaconda.org/rapidsai-nightly/noarch https://conda.anaconda.org/dask/label/dev/linux-64 https://conda.anaconda.org/dask/label/dev/noarch https://conda.anaconda.org/pytorch/linux-64 https://conda.anaconda.org/pytorch/noarch https://conda.anaconda.org/conda-forge/linux-64 https://conda.anaconda.org/conda-forge/noarch https://conda.anaconda.org/nvidia/linux-64 https://conda.anaconda.org/nvidia/noarch package cache : /opt/conda/pkgs /github/home/.conda/pkgs envs directories : /opt/conda/envs /github/home/.conda/envs platform : linux-64 user-agent : conda/24.5.0 requests/2.32.3 CPython/3.11.9 Linux/5.4.0-177-generic ubuntu/22.04.4 glibc/2.35 solver/libmamba conda-libmamba-solver/24.1.0 libmambapy/1.5.8 UID:GID : 0:0 netrc file : None offline mode : False ==> /opt/conda/.condarc <== auto_update_conda: False channels: - rapidsai - rapidsai-nightly - dask/label/dev - pytorch - conda-forge - nvidia always_yes: True number_channel_notices: 0 conda-build: set_build_id: False root_dir: /tmp/conda-bld-workspace output_folder: /tmp/conda-bld-output ==> envvars <== allow_softlinks: False # packages in environment at /opt/conda/envs/test: # # Name Version Build Channel _libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 2_gnu conda-forge anyio 4.4.0 pyhd8ed1ab_0 conda-forge argon2-cffi 23.1.0 pyhd8ed1ab_0 conda-forge argon2-cffi-bindings 21.2.0 py311h459d7ec_4 conda-forge arrow 1.3.0 pyhd8ed1ab_0 conda-forge asttokens 2.4.1 pyhd8ed1ab_0 conda-forge async-lru 2.0.4 pyhd8ed1ab_0 conda-forge attrs 23.2.0 pyh71513ae_0 conda-forge aws-c-auth 0.7.22 hbd3ac97_10 conda-forge aws-c-cal 0.7.1 h87b94db_1 conda-forge aws-c-common 0.9.23 h4ab18f5_0 conda-forge aws-c-compression 0.2.18 he027950_7 conda-forge aws-c-event-stream 0.4.2 h7671281_15 conda-forge aws-c-http 0.8.2 he17ee6b_6 conda-forge aws-c-io 0.14.10 h826b7d6_1 conda-forge aws-c-mqtt 0.10.4 hcd6a914_8 conda-forge aws-c-s3 0.6.0 h365ddd8_2 conda-forge aws-c-sdkutils 0.1.16 he027950_3 conda-forge aws-checksums 0.1.18 he027950_7 conda-forge aws-crt-cpp 0.27.3 hda66527_2 conda-forge aws-sdk-cpp 1.11.329 h46c3b66_9 conda-forge azure-core-cpp 1.12.0 h830ed8b_0 conda-forge azure-identity-cpp 1.8.0 hdb0d106_1 conda-forge azure-storage-blobs-cpp 12.11.0 ha67cba7_1 conda-forge azure-storage-common-cpp 12.6.0 he3f277c_1 conda-forge azure-storage-files-datalake-cpp 12.10.0 h29b5301_1 conda-forge babel 2.14.0 pyhd8ed1ab_0 conda-forge beautifulsoup4 4.12.3 pyha770c72_0 conda-forge bleach 6.1.0 pyhd8ed1ab_0 conda-forge blosc 1.21.6 hef167b5_0 conda-forge bokeh 3.5.0 pyhd8ed1ab_0 conda-forge branca 0.7.2 pyhd8ed1ab_0 conda-forge brotli 1.1.0 hd590300_1 conda-forge brotli-bin 1.1.0 hd590300_1 conda-forge brotli-python 1.1.0 py311hb755f60_1 conda-forge bzip2 1.0.8 h4bc722e_7 conda-forge c-ares 1.32.2 h4bc722e_0 conda-forge ca-certificates 2024.7.4 hbcca054_0 conda-forge cached-property 1.5.2 hd8ed1ab_1 conda-forge cached_property 1.5.2 pyha770c72_1 conda-forge cachetools 5.4.0 pyhd8ed1ab_0 conda-forge cairo 1.18.0 h3faef2a_0 conda-forge certifi 2024.7.4 pyhd8ed1ab_0 conda-forge cffi 1.16.0 py311hb3a22ac_0 conda-forge cfitsio 4.3.1 hbdc6101_0 conda-forge charset-normalizer 3.3.2 pyhd8ed1ab_0 conda-forge click 8.1.7 unix_pyh707e725_0 conda-forge click-plugins 1.1.1 py_0 conda-forge cligj 0.7.2 pyhd8ed1ab_1 conda-forge cloudpickle 3.0.0 pyhd8ed1ab_0 conda-forge comm 0.2.2 pyhd8ed1ab_0 conda-forge contourpy 1.2.1 py311h9547e67_0 conda-forge cuda-cccl_linux-64 12.2.140 ha770c72_0 conda-forge cuda-crt-dev_linux-64 12.2.140 ha770c72_1 conda-forge cuda-crt-tools 12.2.140 ha770c72_1 conda-forge cuda-cudart 12.2.140 hd3aeb46_0 conda-forge cuda-cudart-dev 12.2.140 hd3aeb46_0 conda-forge cuda-cudart-dev_linux-64 12.2.140 h59595ed_0 conda-forge cuda-cudart-static 12.2.140 hd3aeb46_0 conda-forge cuda-cudart-static_linux-64 12.2.140 h59595ed_0 conda-forge cuda-cudart_linux-64 12.2.140 h59595ed_0 conda-forge cuda-nvcc-dev_linux-64 12.2.140 ha770c72_1 conda-forge cuda-nvcc-impl 12.2.140 hd3aeb46_1 conda-forge cuda-nvcc-tools 12.2.140 hd3aeb46_1 conda-forge cuda-nvrtc 12.2.140 hd3aeb46_0 conda-forge cuda-nvvm-dev_linux-64 12.2.140 ha770c72_1 conda-forge cuda-nvvm-impl 12.2.140 h59595ed_1 conda-forge cuda-nvvm-tools 12.2.140 h59595ed_1 conda-forge cuda-profiler-api 12.2.140 ha770c72_0 conda-forge cuda-python 12.5.0 py311h817de4b_1 conda-forge cuda-version 12.2 he2b69de_3 conda-forge cudf 24.08.00a322 cuda12_py311_240717_g093bcc94cc_322 rapidsai-nightly cuml 24.08.00a35 cuda12_py311_240716_g98721e239_35 rapidsai-nightly cuproj 24.08.00a20 cuda12_py311_240717_ga2d8ce19_20 file:///tmp/python_channel cupy 13.2.0 py311he5a987b_0 conda-forge cupy-core 13.2.0 py311h3bdf873_0 conda-forge curl 8.8.0 he654da7_1 conda-forge cuspatial 24.08.00a20 cuda12_py311_240717_ga2d8ce19_20 file:///tmp/python_channel cycler 0.12.1 pyhd8ed1ab_0 conda-forge cytoolz 0.12.3 py311h459d7ec_0 conda-forge dask 2024.7.0 pyhd8ed1ab_0 conda-forge dask-core 2024.7.0 pyhd8ed1ab_0 conda-forge dask-cuda 24.08.00a12 py311_240717_gc31aaac_12 rapidsai-nightly dask-cudf 24.08.00a322 cuda12_py311_240717_g093bcc94cc_322 rapidsai-nightly dask-expr 1.1.7 pyhd8ed1ab_0 conda-forge debugpy 1.8.2 py311h4332511_0 conda-forge decorator 5.1.1 pyhd8ed1ab_0 conda-forge defusedxml 0.7.1 pyhd8ed1ab_0 conda-forge distributed 2024.7.0 pyhd8ed1ab_0 conda-forge distributed-ucxx 0.39.00a py3.11_240717_g36284cb_10 rapidsai-nightly dlpack 0.8 h59595ed_3 conda-forge entrypoints 0.4 pyhd8ed1ab_0 conda-forge exceptiongroup 1.2.2 pyhd8ed1ab_0 conda-forge executing 2.0.1 pyhd8ed1ab_0 conda-forge expat 2.6.2 h59595ed_0 conda-forge fastrlock 0.8.2 py311hb755f60_2 conda-forge fiona 1.9.5 py311hf8e0aa6_2 conda-forge fmt 10.2.1 h00ab1b0_0 conda-forge folium 0.17.0 pyhd8ed1ab_0 conda-forge font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge font-ttf-inconsolata 3.000 h77eed37_0 conda-forge font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge font-ttf-ubuntu 0.83 h77eed37_2 conda-forge fontconfig 2.14.2 h14ed4e7_0 conda-forge fonts-conda-ecosystem 1 0 conda-forge fonts-conda-forge 1 0 conda-forge fonttools 4.53.1 py311h61187de_0 conda-forge fqdn 1.5.1 pyhd8ed1ab_0 conda-forge freetype 2.12.1 h267a509_2 conda-forge freexl 2.0.0 h743c826_0 conda-forge fsspec 2024.6.1 pyhff2d567_0 conda-forge gdal 3.8.1 py311h39b4e0e_3 conda-forge geopandas 0.14.4 pyhd8ed1ab_0 conda-forge geopandas-base 0.14.4 pyha770c72_0 conda-forge geos 3.12.1 h59595ed_0 conda-forge geotiff 1.7.1 hf074850_14 conda-forge gettext 0.22.5 h59595ed_2 conda-forge gettext-tools 0.22.5 h59595ed_2 conda-forge gflags 2.2.2 he1b5a44_1004 conda-forge giflib 5.2.2 hd590300_0 conda-forge glog 0.7.1 hbabe93e_0 conda-forge h11 0.14.0 pyhd8ed1ab_0 conda-forge h2 4.1.0 pyhd8ed1ab_0 conda-forge hdf4 4.2.15 h2a13503_7 conda-forge hdf5 1.14.3 nompi_hdf9ad27_105 conda-forge hpack 4.0.0 pyh9f0ad1d_0 conda-forge httpcore 1.0.5 pyhd8ed1ab_0 conda-forge httpx 0.27.0 pyhd8ed1ab_0 conda-forge hyperframe 6.0.1 pyhd8ed1ab_0 conda-forge icu 73.2 h59595ed_0 conda-forge idna 3.7 pyhd8ed1ab_0 conda-forge imagecodecs-lite 2019.12.3 py311h18e1886_8 conda-forge imageio 2.34.2 pyh12aca89_0 conda-forge importlib-metadata 8.0.0 pyha770c72_0 conda-forge importlib_metadata 8.0.0 hd8ed1ab_0 conda-forge importlib_resources 6.4.0 pyhd8ed1ab_0 conda-forge ipykernel 6.29.5 pyh3099207_0 conda-forge ipython 8.26.0 pyh707e725_0 conda-forge ipywidgets 8.1.3 pyhd8ed1ab_0 conda-forge isoduration 20.11.0 pyhd8ed1ab_0 conda-forge jedi 0.19.1 pyhd8ed1ab_0 conda-forge jinja2 3.1.4 pyhd8ed1ab_0 conda-forge joblib 1.4.2 pyhd8ed1ab_0 conda-forge json-c 0.17 h1220068_1 conda-forge json5 0.9.25 pyhd8ed1ab_0 conda-forge jsonpointer 3.0.0 py311h38be061_0 conda-forge jsonschema 4.23.0 pyhd8ed1ab_0 conda-forge jsonschema-specifications 2023.12.1 pyhd8ed1ab_0 conda-forge jsonschema-with-format-nongpl 4.23.0 hd8ed1ab_0 conda-forge jupyter-lsp 2.2.5 pyhd8ed1ab_0 conda-forge jupyter_client 8.6.2 pyhd8ed1ab_0 conda-forge jupyter_core 5.7.2 py311h38be061_0 conda-forge jupyter_events 0.10.0 pyhd8ed1ab_0 conda-forge jupyter_server 2.14.2 pyhd8ed1ab_0 conda-forge jupyter_server_terminals 0.5.3 pyhd8ed1ab_0 conda-forge jupyterlab 4.2.3 pyhd8ed1ab_0 conda-forge jupyterlab_pygments 0.3.0 pyhd8ed1ab_1 conda-forge jupyterlab_server 2.27.3 pyhd8ed1ab_0 conda-forge jupyterlab_widgets 3.0.11 pyhd8ed1ab_0 conda-forge kealib 1.5.3 hee9dde6_1 conda-forge keyutils 1.6.1 h166bdaf_0 conda-forge kiwisolver 1.4.5 py311h9547e67_1 conda-forge krb5 1.21.3 h659f571_0 conda-forge lazy_loader 0.4 pyhd8ed1ab_0 conda-forge lcms2 2.16 hb7c19ff_0 conda-forge ld_impl_linux-64 2.40 hf3520f5_7 conda-forge lerc 4.0.0 h27087fc_0 conda-forge libabseil 20240116.2 cxx17_he02047a_1 conda-forge libaec 1.1.3 h59595ed_0 conda-forge libarchive 3.7.4 hfca40fe_0 conda-forge libarrow 16.1.0 h34456a7_14_cpu conda-forge libarrow-acero 16.1.0 he02047a_14_cpu conda-forge libarrow-dataset 16.1.0 he02047a_14_cpu conda-forge libarrow-substrait 16.1.0 hc9a23c6_14_cpu conda-forge libasprintf 0.22.5 h661eb56_2 conda-forge libasprintf-devel 0.22.5 h661eb56_2 conda-forge libblas 3.9.0 22_linux64_openblas conda-forge libbrotlicommon 1.1.0 hd590300_1 conda-forge libbrotlidec 1.1.0 hd590300_1 conda-forge libbrotlienc 1.1.0 hd590300_1 conda-forge libcblas 3.9.0 22_linux64_openblas conda-forge libcrc32c 1.1.2 h9c3ff4c_0 conda-forge libcublas 12.2.5.6 hd3aeb46_0 conda-forge libcublas-dev 12.2.5.6 hd3aeb46_0 conda-forge libcudf 24.08.00a322 cuda12_240717_g093bcc94cc_322 rapidsai-nightly libcufft 11.0.8.103 hd3aeb46_0 conda-forge libcufile 1.7.2.10 hd3aeb46_0 conda-forge libcufile-dev 1.7.2.10 hd3aeb46_0 conda-forge libcuml 24.08.00a35 cuda12_240716_g98721e239_35 rapidsai-nightly libcumlprims 24.08.00a cuda12_240717_g6a1017c_7 rapidsai-nightly libcurand 10.3.3.141 hd3aeb46_0 conda-forge libcurand-dev 10.3.3.141 hd3aeb46_0 conda-forge libcurl 8.8.0 hca28451_1 conda-forge libcusolver 11.5.2.141 hd3aeb46_0 conda-forge libcusolver-dev 11.5.2.141 hd3aeb46_0 conda-forge libcusparse 12.1.2.141 hd3aeb46_0 conda-forge libcusparse-dev 12.1.2.141 hd3aeb46_0 conda-forge libcuspatial 24.08.00a20 cuda12_240717_ga2d8ce19_20 file:///tmp/cpp_channel libdeflate 1.19 hd590300_0 conda-forge libedit 3.1.20191231 he28a2e2_2 conda-forge libev 4.33 hd590300_2 conda-forge libevent 2.1.12 hf998b51_1 conda-forge libexpat 2.6.2 h59595ed_0 conda-forge libffi 3.4.2 h7f98852_5 conda-forge libgcc-ng 14.1.0 h77fa898_0 conda-forge libgdal 3.8.1 h4b8bffa_3 conda-forge libgettextpo 0.22.5 h59595ed_2 conda-forge libgettextpo-devel 0.22.5 h59595ed_2 conda-forge libgfortran-ng 14.1.0 h69a702a_0 conda-forge libgfortran5 14.1.0 hc5f4f2c_0 conda-forge libglib 2.78.4 h783c2da_0 conda-forge libgomp 14.1.0 h77fa898_0 conda-forge libgoogle-cloud 2.26.0 h26d7fe4_0 conda-forge libgoogle-cloud-storage 2.26.0 ha262f82_0 conda-forge libgrpc 1.62.2 h15f2491_0 conda-forge libiconv 1.17 hd590300_2 conda-forge libjpeg-turbo 3.0.0 hd590300_1 conda-forge libkml 1.3.0 hbbc8833_1020 conda-forge libkvikio 24.08.00a cuda12_240717_gab3778c_18 rapidsai-nightly liblapack 3.9.0 22_linux64_openblas conda-forge libllvm14 14.0.6 hcd5def8_4 conda-forge libnetcdf 4.9.2 nompi_h135f659_114 conda-forge libnghttp2 1.58.0 h47da74e_1 conda-forge libnl 3.9.0 hd590300_0 conda-forge libnsl 2.0.1 hd590300_0 conda-forge libnvjitlink 12.2.140 hd3aeb46_0 conda-forge libopenblas 0.3.27 pthreads_hac2b453_1 conda-forge libparquet 16.1.0 h9e5060d_14_cpu conda-forge libpng 1.6.43 h2797004_0 conda-forge libpq 16.3 ha72fbe1_0 conda-forge libprotobuf 4.25.3 h08a7969_0 conda-forge libraft 24.08.00a43 cuda12_240717_gab5e1287_43 rapidsai-nightly libraft-headers 24.08.00a43 cuda12_240717_gab5e1287_43 rapidsai-nightly libraft-headers-only 24.08.00a43 cuda12_240717_gab5e1287_43 rapidsai-nightly libre2-11 2023.09.01 h5a48ba9_2 conda-forge librmm 24.08.00a27 cuda12_240717_gf91ca6f2_27 rapidsai-nightly librttopo 1.1.0 h8917695_15 conda-forge libsodium 1.0.18 h36c2ea0_1 conda-forge libspatialindex 2.0.0 he02047a_0 conda-forge libspatialite 5.1.0 h72606ae_3 conda-forge libsqlite 3.46.0 hde9e2c9_0 conda-forge libssh2 1.11.0 h0841786_0 conda-forge libstdcxx-ng 14.1.0 hc0a3c3a_0 conda-forge libthrift 0.19.0 hb90f79a_1 conda-forge libtiff 4.6.0 ha9c0a0a_2 conda-forge libucxx 0.39.00a cuda12_240717_g36284cb_10 rapidsai-nightly libutf8proc 2.8.0 h166bdaf_0 conda-forge libuuid 2.38.1 h0b41bf4_0 conda-forge libwebp-base 1.4.0 hd590300_0 conda-forge libxcb 1.15 h0b41bf4_0 conda-forge libxcrypt 4.4.36 hd590300_1 conda-forge libxml2 2.12.7 h4c95cb1_3 conda-forge libzip 1.10.1 h2629f0a_3 conda-forge libzlib 1.3.1 h4ab18f5_1 conda-forge llvmlite 0.43.0 py311hbde99c3_0 conda-forge locket 1.0.0 pyhd8ed1ab_0 conda-forge lz4 4.3.3 py311h38e4bf4_0 conda-forge lz4-c 1.9.4 hcb278e6_0 conda-forge lzo 2.10 hd590300_1001 conda-forge mapclassify 2.6.1 pyhd8ed1ab_0 conda-forge markdown-it-py 3.0.0 pyhd8ed1ab_0 conda-forge markupsafe 2.1.5 py311h459d7ec_0 conda-forge matplotlib-base 3.9.1 py311hffb96ce_0 conda-forge matplotlib-inline 0.1.7 pyhd8ed1ab_0 conda-forge mdurl 0.1.2 pyhd8ed1ab_0 conda-forge minizip 4.0.7 h401b404_0 conda-forge mistune 3.0.2 pyhd8ed1ab_0 conda-forge msgpack-python 1.0.8 py311h52f7536_0 conda-forge munkres 1.1.4 pyh9f0ad1d_0 conda-forge nbclient 0.10.0 pyhd8ed1ab_0 conda-forge nbconvert-core 7.16.4 pyhd8ed1ab_1 conda-forge nbformat 5.10.4 pyhd8ed1ab_0 conda-forge nccl 2.22.3.1 hbc370b7_0 conda-forge ncurses 6.5 h59595ed_0 conda-forge nest-asyncio 1.6.0 pyhd8ed1ab_0 conda-forge networkx 3.3 pyhd8ed1ab_1 conda-forge notebook 7.2.1 pyhd8ed1ab_0 conda-forge notebook-shim 0.2.4 pyhd8ed1ab_0 conda-forge nspr 4.35 h27087fc_0 conda-forge nss 3.102 h593d115_0 conda-forge numba 0.60.0 py311h4bc866e_0 conda-forge numpy 1.26.4 py311h64a7726_0 conda-forge nvcomp 3.0.6 h10b603f_0 conda-forge nvtx 0.2.10 py311h459d7ec_0 conda-forge openjpeg 2.5.2 h488ebb8_0 conda-forge openssl 3.3.1 h4bc722e_2 conda-forge orc 2.0.1 h17fec99_1 conda-forge overrides 7.7.0 pyhd8ed1ab_0 conda-forge packaging 24.1 pyhd8ed1ab_0 conda-forge pandas 2.2.2 py311h14de704_1 conda-forge pandocfilters 1.5.0 pyhd8ed1ab_0 conda-forge parso 0.8.4 pyhd8ed1ab_0 conda-forge partd 1.4.2 pyhd8ed1ab_0 conda-forge pcre2 10.42 hcad00b1_0 conda-forge pexpect 4.9.0 pyhd8ed1ab_0 conda-forge pickleshare 0.7.5 py_1003 conda-forge pillow 10.3.0 py311h18e6fac_0 conda-forge pip 24.0 pyhd8ed1ab_0 conda-forge pixman 0.43.2 h59595ed_0 conda-forge pkgutil-resolve-name 1.3.10 pyhd8ed1ab_1 conda-forge platformdirs 4.2.2 pyhd8ed1ab_0 conda-forge poppler 23.12.0 h590f24d_0 conda-forge poppler-data 0.4.12 hd8ed1ab_0 conda-forge postgresql 16.3 h8e811e2_0 conda-forge proj 9.3.0 h1d62c97_2 conda-forge prometheus_client 0.20.0 pyhd8ed1ab_0 conda-forge prompt-toolkit 3.0.47 pyha770c72_0 conda-forge psutil 6.0.0 py311h331c9d8_0 conda-forge pthread-stubs 0.4 h36c2ea0_1001 conda-forge ptyprocess 0.7.0 pyhd3deb0d_0 conda-forge pure_eval 0.2.2 pyhd8ed1ab_0 conda-forge pyarrow 16.1.0 py311hbd00459_4 conda-forge pyarrow-core 16.1.0 py311h8c3dac4_4_cpu conda-forge pyarrow-hotfix 0.6 pyhd8ed1ab_0 conda-forge pycparser 2.22 pyhd8ed1ab_0 conda-forge pydeck 0.8.0 pyhd8ed1ab_0 conda-forge pygments 2.18.0 pyhd8ed1ab_0 conda-forge pylibraft 24.08.00a43 cuda12_py311_240717_gab5e1287_43 rapidsai-nightly pynvjitlink 0.3.0 py311hd269673_0 rapidsai pynvml 11.4.1 pyhd8ed1ab_0 conda-forge pyparsing 3.1.2 pyhd8ed1ab_0 conda-forge pyproj 3.6.1 py311h1facc83_4 conda-forge pysocks 1.7.1 pyha2e5f31_6 conda-forge python 3.11.9 hb806964_0_cpython conda-forge python-dateutil 2.9.0 pyhd8ed1ab_0 conda-forge python-fastjsonschema 2.20.0 pyhd8ed1ab_0 conda-forge python-json-logger 2.0.7 pyhd8ed1ab_0 conda-forge python-tzdata 2024.1 pyhd8ed1ab_0 conda-forge python_abi 3.11 4_cp311 conda-forge pytz 2024.1 pyhd8ed1ab_0 conda-forge pywavelets 1.6.0 py311h18e1886_0 conda-forge pyyaml 6.0.1 py311h459d7ec_1 conda-forge pyzmq 26.0.3 py311h08a0b41_0 conda-forge qhull 2020.2 h434a139_5 conda-forge raft-dask 24.08.00a43 cuda12_py311_240717_gab5e1287_43 rapidsai-nightly rapids-dask-dependency 24.08.00a5 py_0 rapidsai-nightly rdma-core 52.0 he02047a_0 conda-forge re2 2023.09.01 h7f4b329_2 conda-forge readline 8.2 h8228510_1 conda-forge referencing 0.35.1 pyhd8ed1ab_0 conda-forge requests 2.32.3 pyhd8ed1ab_0 conda-forge rfc3339-validator 0.1.4 pyhd8ed1ab_0 conda-forge rfc3986-validator 0.1.1 pyh9f0ad1d_0 conda-forge rich 13.7.1 pyhd8ed1ab_0 conda-forge rmm 24.08.00a27 cuda12_py311_240717_gf91ca6f2_27 rapidsai-nightly rpds-py 0.19.0 py311hb3a8bbb_0 conda-forge rtree 1.3.0 py311h51bcefd_1 conda-forge s2n 1.4.17 he19d79f_0 conda-forge scikit-image 0.20.0 py311h2872171_1 conda-forge scikit-learn 1.5.1 py311hd632256_0 conda-forge scipy 1.14.0 py311h517d4fd_1 conda-forge send2trash 1.8.3 pyh0d859eb_0 conda-forge setuptools 70.3.0 pyhd8ed1ab_0 conda-forge shapely 2.0.4 py311h0bed3d6_1 conda-forge six 1.16.0 pyh6c4a22f_0 conda-forge snappy 1.2.1 ha2e4443_0 conda-forge sniffio 1.3.1 pyhd8ed1ab_0 conda-forge sortedcontainers 2.4.0 pyhd8ed1ab_0 conda-forge soupsieve 2.5 pyhd8ed1ab_1 conda-forge spdlog 1.12.0 hd2e6256_2 conda-forge sqlite 3.46.0 h6d4b2fc_0 conda-forge stack_data 0.6.2 pyhd8ed1ab_0 conda-forge tblib 3.0.0 pyhd8ed1ab_0 conda-forge terminado 0.18.1 pyh0d859eb_0 conda-forge threadpoolctl 3.5.0 pyhc1e730c_0 conda-forge tifffile 2020.6.3 py_0 conda-forge tiledb 2.18.2 h99f50a1_1 conda-forge tinycss2 1.3.0 pyhd8ed1ab_0 conda-forge tk 8.6.13 noxft_h4845f30_101 conda-forge tomli 2.0.1 pyhd8ed1ab_0 conda-forge toolz 0.12.1 pyhd8ed1ab_0 conda-forge tornado 6.4.1 py311h331c9d8_0 conda-forge traitlets 5.14.3 pyhd8ed1ab_0 conda-forge treelite 4.2.1 py311he8f9275_0 conda-forge types-python-dateutil 2.9.0.20240316 pyhd8ed1ab_0 conda-forge typing-extensions 4.12.2 hd8ed1ab_0 conda-forge typing_extensions 4.12.2 pyha770c72_0 conda-forge typing_utils 0.1.0 pyhd8ed1ab_0 conda-forge tzcode 2024a h3f72095_0 conda-forge tzdata 2024a h0c530f3_0 conda-forge ucx 1.15.0 hda83522_8 conda-forge ucx-py 0.39.00a7 py311_240717_g3741610_7 rapidsai-nightly ucxx 0.39.00a cuda12_py3.11_240717_g36284cb_10 rapidsai-nightly uri-template 1.3.0 pyhd8ed1ab_0 conda-forge uriparser 0.9.8 hac33072_0 conda-forge urllib3 2.2.2 pyhd8ed1ab_1 conda-forge wcwidth 0.2.13 pyhd8ed1ab_0 conda-forge webcolors 24.6.0 pyhd8ed1ab_0 conda-forge webencodings 0.5.1 pyhd8ed1ab_2 conda-forge websocket-client 1.8.0 pyhd8ed1ab_0 conda-forge wheel 0.43.0 pyhd8ed1ab_1 conda-forge widgetsnbextension 4.0.11 pyhd8ed1ab_0 conda-forge xerces-c 3.2.5 hac6953d_0 conda-forge xorg-kbproto 1.0.7 h7f98852_1002 conda-forge xorg-libice 1.1.1 hd590300_0 conda-forge xorg-libsm 1.2.4 h7391055_0 conda-forge xorg-libx11 1.8.9 h8ee46fc_0 conda-forge xorg-libxau 1.0.11 hd590300_0 conda-forge xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge xorg-libxext 1.3.4 h0b41bf4_2 conda-forge xorg-libxrender 0.9.11 hd590300_0 conda-forge xorg-renderproto 0.11.1 h7f98852_1002 conda-forge xorg-xextproto 7.3.0 h0b41bf4_1003 conda-forge xorg-xproto 7.0.31 h7f98852_1007 conda-forge xyzservices 2024.6.0 pyhd8ed1ab_0 conda-forge xz 5.2.6 h166bdaf_0 conda-forge yaml 0.2.5 h7f98852_2 conda-forge zeromq 4.3.5 h75354e8_4 conda-forge zict 3.0.0 pyhd8ed1ab_0 conda-forge zipp 3.19.2 pyhd8ed1ab_0 conda-forge zlib 1.3.1 h4ab18f5_1 conda-forge zstandard 0.23.0 py311h5cd10c7_0 conda-forge zstd 1.5.6 ha6fb4c9_0 conda-forge /__w/cuspatial/cuspatial/notebooks /__w/cuspatial/cuspatial Wed Jul 17 15:37:47 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 Tesla V100-PCIE-32GB Off | 00000000:85:00.0 Off | 0 | | N/A 24C P0 24W / 250W | 0MiB / 32768MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+ ```

(example build link)

Other/Misc.

Other symptoms that led to this were documented in #1406.

That was closed by just skipping the most expensive notebooks, in #1407.

harrism commented 1 month ago

@trxcllnt recently modified point_in_polygon. Could those changes have caused this?

jameslamb commented 1 month ago

Are you referring to #1381?

It could be related, but I don't think it'd be the root cause by itself. Those changes were made 2+ months ago, and as recently as #1404 (2 weeks ago), the conda-notebook-tests CI job here was completing in around 9 minutes (build link).

isVoid commented 1 month ago

Also that PR modified the quadtree PiP algo, but the algo in question here is the non-quadtree version.

harrism commented 1 month ago

I did some profiling using pyspy. This is not a complete profile, I have just been running for about 4.5 minutes using py-spy top -- python test.py (test.py contains the code above).

Collecting samples from 'python test.py' (python v3.10.14)
Total Samples 38284
GIL: 100.00%, Active: 100.00%, Threads: 1

  %Own   %Total  OwnTime  TotalTime  Function (filename:line)                                                                                                                                                                                        
 40.00%  79.00%   159.3s    272.1s   compute_index (numba/misc/dummyarray.py:111)
 18.00%  39.00%   57.19s    112.8s   <genexpr> (numba/misc/dummyarray.py:111)
 21.00%  21.00%   55.60s    55.60s   get_offset (numba/misc/dummyarray.py:83)
  8.00%   8.00%   20.65s    20.65s   iter_contiguous_extent (numba/misc/dummyarray.py:275)
  0.00%   0.00%   17.83s    17.83s   iter_contiguous_extent (numba/misc/dummyarray.py:270)
 10.00%  89.00%   15.99s    166.7s   iter_contiguous_extent (numba/misc/dummyarray.py:274)
  0.00%   0.00%   15.00s    136.3s   iter_contiguous_extent (numba/misc/dummyarray.py:269)
  0.00%   0.00%    8.25s     8.25s   iter_contiguous_extent (numba/misc/dummyarray.py:268)
  2.00%   2.00%    8.06s     8.06s   iter_contiguous_extent (numba/misc/dummyarray.py:273)
  0.00% 100.00%    6.88s    375.5s   __getitem__ (numba/cuda/cudadrv/devicearray.py:630)
  0.00%   0.00%    5.59s    170.6s   __getitem__ (numba/misc/dummyarray.py:239)
  1.00% 100.00%    2.61s    198.0s   _do_getitem (numba/cuda/cudadrv/devicearray.py:642)
  0.00%   0.00%    2.58s    165.0s   reshape (numba/misc/dummyarray.py:351)
  0.00%   0.00%    2.32s     2.33s   read_csv (cudf/io/csv.py:96)
  0.00%   0.00%   0.800s     2.61s   _call_with_frames_removed (<frozen importlib._bootstrap>:241)
  0.00%   0.00%   0.210s    0.210s   point_in_polygon (cuspatial/core/spatial/join.py:82)
  0.00%   0.00%   0.180s    0.180s   _compile_bytecode (<frozen importlib._bootstrap_external>:672)
  0.00%   0.00%   0.150s    0.180s   inner (contextlib.py:79)
  0.00%   0.00%   0.140s    0.140s   append (numba/core/byteflow.py:1743)
  0.00%   0.00%   0.130s    0.130s   __init__ (fiona/collection.py:243)
  0.00%   0.00%   0.130s    0.130s   <listcomp> (shapely/geometry/polygon.py:91)

Nearly all the time is spent in Numba. I used py-spy to output this svg (but only ran it for about a minute). But this flame plot gives an idea of where Numba is being called.

profile

harrism commented 1 month ago

@mroeschke since you have touched a lot of places in cuSpatial and cuDF recently can you tell us if this code perhaps is now running in numba but didn't used to? That could explain the huge performance regression we are seeing.