[BUG] `forest_inference_demo.ipynb` is broken

Describe the bug

The forest_inference_demo.ipynb notebook here is broken. XGBoost model loading with FIL is failing.

I've observed this behavior on the 24.08 release of cuml and all its dependencies. I suspect it's a problem on 24.10 as well, but haven't tested that yet.

Steps/Code to reproduce bug

Created a conda environment and installed cuml, jupyterlab, and xgboost into it.

setup (click me)

Ran the following from the root of the repo, on a machine with V100s and CUDA 12.2. ```shell conda env create \ --name cuml-cu12-dev \ --file ./conda/environments/all_cuda-125_arch-x86_64.yaml source activate cuml-cu12-dev conda install \ -c conda-forge \ -c rapidsai-nightly \ -c rapidsai \ --yes \ cuml=24.8.* \ jupyterlab ```

Then launched JupyterLab.

jupyter lab --ip 0.0.0.0 --port 1234

Ran the cells in notebooks/forest_inference_demo.ipynb in order.

This call to ForestInference.load()

https://github.com/rapidsai/cuml/blob/e571abaf068b21173984e07b73c91bf0be8da7b5/notebooks/forest_inference_demo.ipynb#L273

Fails like this:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[12], line 1
----> 1 fil_model = ForestInference.load(
      2     filename=model_path,
      3     algo='BATCH_TREE_REORG',
      4     output_class=True,
      5     threshold=0.50,
      6     model_type='xgboost'
      7 )

File fil.pyx:1033, in cuml.fil.fil.ForestInference.load()

File fil.pyx:212, in cuml.fil.fil.TreeliteModel.from_filename()

RuntimeError: Failed to load xgb.model (basic_string::_M_replace_aux)

This same error can be seen in the most recent run of this notebook in the CI for rapidsai/docker: https://github.com/rapidsai/docker/actions/runs/10244736365/job/28356773321#step:9:15

Expected behavior

Expected this notebook to run end-to-end without error.

Environment details (please complete the following information):

output of 'conda info' (click me)

```text active environment : cuml-cu12-dev active env location : /raid/jlamb/miniforge/envs/cuml-cu12-dev shell level : 1 user config file : /home/nfs/jlamb/.condarc populated config files : /raid/jlamb/miniforge/.condarc /home/nfs/jlamb/.condarc conda version : 23.7.4 conda-build version : 24.5.1 python version : 3.10.12.final.0 virtual packages : __archspec=1=x86_64 __cuda=12.2=0 __glibc=2.31=0 __linux=5.4.0=0 __unix=0=0 base environment : /raid/jlamb/miniforge (writable) conda av data dir : /raid/jlamb/miniforge/etc/conda conda av metadata url : None channel URLs : https://conda.anaconda.org/conda-forge/linux-64 https://conda.anaconda.org/conda-forge/noarch package cache : /raid/jlamb/miniforge/pkgs /home/nfs/jlamb/.conda/pkgs envs directories : /raid/jlamb/miniforge/envs /home/nfs/jlamb/.conda/envs platform : linux-64 user-agent : conda/23.7.4 requests/2.32.3 CPython/3.10.12 Linux/5.4.0-182-generic ubuntu/20.04.6 glibc/2.31 UID:GID : 10349:10004 netrc file : None offline mode : False ```

output of 'nvidia-smi' (click me)

```text +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.161.08 Driver Version: 535.161.08 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 Tesla V100-SXM2-32GB On | 00000000:06:00.0 Off | 0 | | N/A 31C P0 41W / 300W | 0MiB / 32768MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 1 Tesla V100-SXM2-32GB On | 00000000:07:00.0 Off | 0 | | N/A 33C P0 42W / 300W | 0MiB / 32768MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 2 Tesla V100-SXM2-32GB On | 00000000:0A:00.0 Off | 0 | | N/A 31C P0 42W / 300W | 0MiB / 32768MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 3 Tesla V100-SXM2-32GB On | 00000000:0B:00.0 Off | 0 | | N/A 29C P0 41W / 300W | 0MiB / 32768MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 4 Tesla V100-SXM2-32GB On | 00000000:85:00.0 Off | 0 | | N/A 31C P0 42W / 300W | 0MiB / 32768MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 5 Tesla V100-SXM2-32GB On | 00000000:86:00.0 Off | 0 | | N/A 30C P0 42W / 300W | 0MiB / 32768MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 6 Tesla V100-SXM2-32GB On | 00000000:89:00.0 Off | 0 | | N/A 35C P0 43W / 300W | 0MiB / 32768MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 7 Tesla V100-SXM2-32GB On | 00000000:8A:00.0 Off | 0 | | N/A 31C P0 43W / 300W | 0MiB / 32768MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | No running processes found | +---------------------------------------------------------------------------------------+ ```

Additional context

This was only noticed because of a CI failure over in rapidsai/docker: https://github.com/rapidsai/docker/pull/699#discussion_r1704654898.

Ideally, it could be caught here in cuml's CI. As of this writing, this notebook is not tested in CI.

SKIPPING: ./forest_inference_demo.ipynb (suspected Dask usage, not currently automatable)

(build link)

This notebook has been running in rapidsai/docker CI for a while. It passed on 24.08 as recently as 2 weeks ago.

Testing cuml/forest_inference_demo.ipynb
Completed cuml/forest_inference_demo.ipynb with 1 warnings and 0 errors

(build link)

So I suspect this is a result of a recent change. Maybe some mix of these:

rapidsai / cuml

[BUG] `forest_inference_demo.ipynb` is broken #6008