[BUG] `str.character_ngrams` produces <NA> with strings < ngram length

Vortexx2 commented 6 months ago

Describe the bug The str.character_ngrams function produces token <NA> for strings which are lesser than the provided n (shown in image for the case of bigrams). result output

I have debugged this and as far as I understand it, it is being caused by an empty list returned by the libstrings.generate_character_ngrams function. This causes to be a part of the result when it is exploded in the problematic function. This issue causes several bugs in downstream tasks (like when using cuml for CountVectorizer etc).

Steps/Code to reproduce bug Minimum code required to reproduce the bug:

import cudf
str_series = cudf.Series(['1744', '4'])
str_series.str.character_ngrams(2)

Expected behavior

should not be a part of the output. This causes several downstream tasks to fail because is not a valid token in the actual input string series. **Environment overview (please complete the following information)** - Environment location: Cloud GCP - Method of cuDF install: pip **Environment details** ``` **git*** Not inside a git repository ***OS Information*** PRETTY_NAME="Debian GNU/Linux 11 (bullseye)" NAME="Debian GNU/Linux" VERSION_ID="11" VERSION="11 (bullseye)" VERSION_CODENAME=bullseye ID=debian HOME_URL="https://www.debian.org/" SUPPORT_URL="https://www.debian.org/support" BUG_REPORT_URL="https://bugs.debian.org/" Linux janmey-gpu-c2 5.10.0-26-cloud-amd64 #1 SMP Debian 5.10.197-1 (2023-09-29) x86_64 GNU/Linux ***GPU Information*** Fri Dec 29 10:21:54 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla T4 On | 00000000:00:04.0 Off | 0 | | N/A 70C P0 33W / 70W | 459MiB / 15360MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 316341 C ..._log_ner/.venv/bin/python 454MiB | +-----------------------------------------------------------------------------+ ***CPU*** Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 46 bits physical, 48 bits virtual CPU(s): 16 On-line CPU(s) list: 0-15 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 79 Model name: Intel(R) Xeon(R) CPU @ 2.20GHz Stepping: 0 CPU MHz: 2199.998 BogoMIPS: 4399.99 Hypervisor vendor: KVM Virtualization type: full L1d cache: 256 KiB L1i cache: 256 KiB L2 cache: 2 MiB L3 cache: 55 MiB NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Mitigation; PTE Inversion Vulnerability Mds: Mitigation; Clear CPU buffers; SMT Host state unknown Vulnerability Meltdown: Mitigation; PTI Vulnerability Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown Vulnerability Retbleed: Mitigation; IBRS Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; IBRS, IBPB conditional, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT Host state unknown Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat md_clear arch_capabilities ***CMake*** /usr/bin/cmake cmake version 3.18.4 CMake suite maintained and supported by Kitware (kitware.com/cmake). ***g++*** /usr/bin/g++ g++ (Debian 10.2.1-6) 10.2.1 20210110 Copyright (C) 2020 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ***nvcc*** /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Wed_Sep_21_10:33:58_PDT_2022 Cuda compilation tools, release 11.8, V11.8.89 Build cuda_11.8.r11.8/compiler.31833905_0 ***Python*** /home/janmeysandeepshukla/datasci/transaction_log_ner/.venv/bin/python Python 3.10.13 ***Environment Variables*** PATH : /home/janmeysandeepshukla/datasci/transaction_log_ner/.venv/bin:/home/janmeysandeepshukla/.vscode-server/bin/0ee08df0cf4527e40edc9aa28f4b5bd38bbff2b2/bin/remote-cli:/usr/local/cuda/bin:/opt/conda/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games LD_LIBRARY_PATH : /usr/local/cuda/lib64:/usr/local/nccl2/lib:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/lib64:/usr/local/nccl2/lib:/usr/local/cuda/extras/CUPTI/lib64 NUMBAPRO_NVVM : NUMBAPRO_LIBDEVICE : CONDA_PREFIX : /opt/conda PYTHON_PATH : ***conda packages*** /opt/conda/bin/conda # packages in environment at /opt/conda: # # Name Version Build Channel _libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 2_gnu conda-forge absl-py 2.0.0 pypi_0 pypi aiofiles 22.1.0 pypi_0 pypi aiohttp 3.9.1 pypi_0 pypi aiohttp-cors 0.7.0 pypi_0 pypi aiorwlock 1.3.0 pypi_0 pypi aiosignal 1.3.1 pypi_0 pypi aiosqlite 0.19.0 pypi_0 pypi anyio 3.7.1 pypi_0 pypi archspec 0.2.2 pyhd8ed1ab_0 conda-forge argon2-cffi 23.1.0 pyhd8ed1ab_0 conda-forge argon2-cffi-bindings 21.2.0 py310h2372a71_4 conda-forge arrow 1.3.0 pyhd8ed1ab_0 conda-forge asttokens 2.4.1 pyhd8ed1ab_0 conda-forge async-lru 2.0.4 pyhd8ed1ab_0 conda-forge async-timeout 4.0.3 pypi_0 pypi attrs 23.1.0 pyh71513ae_1 conda-forge babel 2.13.1 pyhd8ed1ab_0 conda-forge backoff 2.2.1 pypi_0 pypi beatrix-jupyterlab 2023.128.151533 pypi_0 pypi beautifulsoup4 4.12.2 pyha770c72_0 conda-forge bleach 6.1.0 pyhd8ed1ab_0 conda-forge blessed 1.20.0 pypi_0 pypi boltons 23.0.0 pyhd8ed1ab_0 conda-forge brotli-python 1.1.0 py310hc6cd4ac_1 conda-forge bzip2 1.0.8 hd590300_5 conda-forge c-ares 1.23.0 hd590300_0 conda-forge ca-certificates 2023.11.17 hbcca054_0 conda-forge cached-property 1.5.2 hd8ed1ab_1 conda-forge cached_property 1.5.2 pyha770c72_1 conda-forge cachetools 5.3.2 pypi_0 pypi certifi 2023.11.17 pyhd8ed1ab_0 conda-forge cffi 1.16.0 py310h2fee648_0 conda-forge charset-normalizer 3.3.2 pyhd8ed1ab_0 conda-forge click 8.1.7 pypi_0 pypi cloud-tpu-client 0.10 pypi_0 pypi cloudpickle 3.0.0 pypi_0 pypi colorama 0.4.6 pyhd8ed1ab_0 conda-forge colorful 0.5.5 pypi_0 pypi comm 0.2.0 pypi_0 pypi conda 23.11.0 py310hff52083_1 conda-forge conda-libmamba-solver 23.11.1 pyhd8ed1ab_0 conda-forge conda-package-handling 2.2.0 pyh38be061_0 conda-forge conda-package-streaming 0.9.0 pyhd8ed1ab_0 conda-forge contourpy 1.2.0 pypi_0 pypi cryptography 41.0.7 pypi_0 pypi cycler 0.12.1 pypi_0 pypi cython 3.0.6 pypi_0 pypi dacite 1.8.1 pypi_0 pypi dataproc-jupyter-plugin 0.1.59 pypi_0 pypi db-dtypes 1.1.1 pypi_0 pypi debugpy 1.8.0 py310hc6cd4ac_1 conda-forge decorator 5.1.1 pyhd8ed1ab_0 conda-forge defusedxml 0.7.1 pyhd8ed1ab_0 conda-forge deprecated 1.2.14 pypi_0 pypi distlib 0.3.7 pypi_0 pypi distro 1.8.0 pyhd8ed1ab_0 conda-forge dlenv-base 1.0.20231210 py310_0 file:///tmp/conda-pkgs dm-tree 0.1.8 pypi_0 pypi docker 7.0.0 pypi_0 pypi docstring-parser 0.15 pypi_0 pypi entrypoints 0.4 pyhd8ed1ab_0 conda-forge exceptiongroup 1.2.0 pyhd8ed1ab_0 conda-forge executing 2.0.1 pyhd8ed1ab_0 conda-forge farama-notifications 0.0.4 pypi_0 pypi fastapi 0.104.1 pypi_0 pypi filelock 3.13.1 pypi_0 pypi fmt 10.1.1 h00ab1b0_1 conda-forge fonttools 4.46.0 pypi_0 pypi fqdn 1.5.1 pyhd8ed1ab_0 conda-forge frozenlist 1.4.0 pypi_0 pypi fsspec 2023.12.1 pypi_0 pypi gcsfs 2023.12.1 pypi_0 pypi gitdb 4.0.11 pypi_0 pypi gitpython 3.1.40 pypi_0 pypi google-api-core 1.34.0 pypi_0 pypi google-api-python-client 1.8.0 pypi_0 pypi google-auth 2.25.2 pypi_0 pypi google-auth-httplib2 0.1.1 pypi_0 pypi google-auth-oauthlib 1.1.0 pypi_0 pypi google-cloud-aiplatform 1.37.0 pypi_0 pypi google-cloud-artifact-registry 1.10.0 pypi_0 pypi google-cloud-bigquery 3.13.0 pypi_0 pypi google-cloud-bigquery-storage 2.23.0 pypi_0 pypi google-cloud-core 2.4.1 pypi_0 pypi google-cloud-datastore 1.15.5 pypi_0 pypi google-cloud-jupyter-config 0.0.5 pypi_0 pypi google-cloud-language 2.12.0 pypi_0 pypi google-cloud-monitoring 2.17.0 pypi_0 pypi google-cloud-resource-manager 1.11.0 pypi_0 pypi google-cloud-storage 2.13.0 pypi_0 pypi google-crc32c 1.5.0 pypi_0 pypi google-resumable-media 2.6.0 pypi_0 pypi googleapis-common-protos 1.62.0 pypi_0 pypi gpustat 1.0.0 pypi_0 pypi greenlet 3.0.2 pypi_0 pypi grpc-google-iam-v1 0.13.0 pypi_0 pypi grpcio 1.60.0 pypi_0 pypi grpcio-status 1.48.2 pypi_0 pypi gymnasium 0.28.1 pypi_0 pypi h11 0.14.0 pypi_0 pypi htmlmin 0.1.12 pypi_0 pypi httplib2 0.22.0 pypi_0 pypi httptools 0.6.1 pypi_0 pypi icu 73.2 h59595ed_0 conda-forge idna 3.6 pyhd8ed1ab_0 conda-forge imagehash 4.3.1 pypi_0 pypi imageio 2.33.0 pypi_0 pypi importlib-metadata 6.11.0 pypi_0 pypi importlib_metadata 7.0.0 hd8ed1ab_0 conda-forge importlib_resources 6.1.1 pyhd8ed1ab_0 conda-forge ipykernel 6.27.1 pypi_0 pypi ipython 8.18.1 pyh707e725_3 conda-forge ipython-genutils 0.2.0 pypi_0 pypi ipython-sql 0.5.0 pypi_0 pypi ipywidgets 8.1.1 pypi_0 pypi isoduration 20.11.0 pyhd8ed1ab_0 conda-forge jaraco-classes 3.3.0 pypi_0 pypi jax-jumpy 1.0.0 pypi_0 pypi jedi 0.19.1 pyhd8ed1ab_0 conda-forge jeepney 0.8.0 pypi_0 pypi jinja2 3.1.2 pyhd8ed1ab_1 conda-forge joblib 1.3.2 pypi_0 pypi json5 0.9.14 pyhd8ed1ab_0 conda-forge jsonpatch 1.33 pyhd8ed1ab_0 conda-forge jsonpointer 2.4 py310hff52083_3 conda-forge jsonschema 4.20.0 pyhd8ed1ab_0 conda-forge jsonschema-specifications 2023.11.2 pyhd8ed1ab_0 conda-forge jsonschema-with-format-nongpl 4.20.0 pyhd8ed1ab_0 conda-forge jupyter-client 7.4.9 pypi_0 pypi jupyter-http-over-ws 0.0.8 pypi_0 pypi jupyter-lsp 2.2.1 pyhd8ed1ab_0 conda-forge jupyter-server-fileid 0.9.0 pypi_0 pypi jupyter-server-mathjax 0.2.6 pypi_0 pypi jupyter-server-proxy 4.1.0 pypi_0 pypi jupyter-server-ydoc 0.8.0 pypi_0 pypi jupyter-ydoc 0.2.5 pypi_0 pypi jupyter_client 8.6.0 pyhd8ed1ab_0 conda-forge jupyter_core 5.5.0 py310hff52083_0 conda-forge jupyter_events 0.9.0 pyhd8ed1ab_0 conda-forge jupyter_server 2.12.1 pyhd8ed1ab_0 conda-forge jupyter_server_terminals 0.4.4 pyhd8ed1ab_1 conda-forge jupyterlab 3.6.6 pypi_0 pypi jupyterlab-git 0.44.0 pypi_0 pypi jupyterlab-widgets 3.0.9 pypi_0 pypi jupyterlab_pygments 0.3.0 pyhd8ed1ab_0 conda-forge jupyterlab_server 2.25.2 pyhd8ed1ab_0 conda-forge jupytext 1.16.0 pypi_0 pypi kernels-mixer 0.0.7 pypi_0 pypi keyring 24.3.0 pypi_0 pypi keyrings-google-artifactregistry-auth 1.1.2 pypi_0 pypi keyutils 1.6.1 h166bdaf_0 conda-forge kfp 2.4.0 pypi_0 pypi kfp-pipeline-spec 0.2.2 pypi_0 pypi kfp-server-api 2.0.5 pypi_0 pypi kiwisolver 1.4.5 pypi_0 pypi krb5 1.21.2 h659d440_0 conda-forge kubernetes 26.1.0 pypi_0 pypi lazy-loader 0.3 pypi_0 pypi ld_impl_linux-64 2.40 h41732ed_0 conda-forge libarchive 3.7.2 h2aa1ff5_1 conda-forge libcurl 8.5.0 hca28451_0 conda-forge libedit 3.1.20191231 he28a2e2_2 conda-forge libev 4.33 hd590300_2 conda-forge libffi 3.4.2 h7f98852_5 conda-forge libgcc-ng 13.2.0 h807b86a_3 conda-forge libgomp 13.2.0 h807b86a_3 conda-forge libiconv 1.17 h166bdaf_0 conda-forge libmamba 1.5.4 had39da4_0 conda-forge libmambapy 1.5.4 py310h39ff949_0 conda-forge libnghttp2 1.58.0 h47da74e_1 conda-forge libnsl 2.0.1 hd590300_0 conda-forge libsodium 1.0.18 h36c2ea0_1 conda-forge libsolv 0.7.27 hfc55251_0 conda-forge libsqlite 3.44.2 h2797004_0 conda-forge libssh2 1.11.0 h0841786_0 conda-forge libstdcxx-ng 13.2.0 h7e041cc_3 conda-forge libuuid 2.38.1 h0b41bf4_0 conda-forge libuv 1.46.0 hd590300_0 conda-forge libxml2 2.12.2 h232c23b_0 conda-forge libzlib 1.2.13 hd590300_5 conda-forge llvmlite 0.41.1 pypi_0 pypi lz4 4.3.2 pypi_0 pypi lz4-c 1.9.4 hcb278e6_0 conda-forge lzo 2.10 h516909a_1000 conda-forge markdown-it-py 3.0.0 pypi_0 pypi markupsafe 2.1.3 py310h2372a71_1 conda-forge matplotlib 3.7.3 pypi_0 pypi matplotlib-inline 0.1.6 pyhd8ed1ab_0 conda-forge mdit-py-plugins 0.4.0 pypi_0 pypi mdurl 0.1.2 pypi_0 pypi menuinst 2.0.0 py310hff52083_1 conda-forge mistune 3.0.2 pyhd8ed1ab_0 conda-forge more-itertools 10.1.0 pypi_0 pypi msgpack 1.0.7 pypi_0 pypi multidict 6.0.4 pypi_0 pypi multimethod 1.10 pypi_0 pypi nb_conda 2.2.1 unix_6 conda-forge nb_conda_kernels 2.3.1 pyhd8ed1ab_3 conda-forge nbclassic 1.0.0 pypi_0 pypi nbclient 0.9.0 pypi_0 pypi nbconvert-core 7.12.0 pyhd8ed1ab_0 conda-forge nbdime 3.2.0 pypi_0 pypi nbformat 5.9.2 pyhd8ed1ab_0 conda-forge ncurses 6.4 h59595ed_2 conda-forge nest-asyncio 1.5.8 pyhd8ed1ab_0 conda-forge networkx 3.2.1 pypi_0 pypi nodejs 20.9.0 hb753e55_0 conda-forge notebook 6.5.6 pypi_0 pypi notebook-executor 0.2 pypi_0 pypi notebook-shim 0.2.3 pyhd8ed1ab_0 conda-forge numba 0.58.1 pypi_0 pypi numpy 1.25.2 pypi_0 pypi nvidia-ml-py 11.495.46 pypi_0 pypi oauth2client 4.1.3 pypi_0 pypi oauthlib 3.2.2 pypi_0 pypi opencensus 0.11.3 pypi_0 pypi opencensus-context 0.1.3 pypi_0 pypi openssl 3.2.0 hd590300_1 conda-forge opentelemetry-api 1.21.0 pypi_0 pypi opentelemetry-exporter-otlp 1.21.0 pypi_0 pypi opentelemetry-exporter-otlp-proto-common 1.21.0 pypi_0 pypi opentelemetry-exporter-otlp-proto-grpc 1.21.0 pypi_0 pypi opentelemetry-exporter-otlp-proto-http 1.21.0 pypi_0 pypi opentelemetry-proto 1.21.0 pypi_0 pypi opentelemetry-sdk 1.21.0 pypi_0 pypi opentelemetry-semantic-conventions 0.42b0 pypi_0 pypi overrides 7.4.0 pyhd8ed1ab_0 conda-forge packaging 23.2 pyhd8ed1ab_0 conda-forge pandas 2.0.3 pypi_0 pypi pandas-profiling 3.6.6 pypi_0 pypi pandocfilters 1.5.0 pyhd8ed1ab_0 conda-forge papermill 2.5.0 pypi_0 pypi parso 0.8.3 pyhd8ed1ab_0 conda-forge patsy 0.5.4 pypi_0 pypi pexpect 4.9.0 pypi_0 pypi phik 0.12.3 pypi_0 pypi pickleshare 0.7.5 py_1003 conda-forge pillow 10.1.0 pypi_0 pypi pip 23.3.1 pyhd8ed1ab_0 conda-forge pkgutil-resolve-name 1.3.10 pyhd8ed1ab_1 conda-forge platformdirs 3.11.0 pypi_0 pypi plotly 5.18.0 pypi_0 pypi pluggy 1.3.0 pyhd8ed1ab_0 conda-forge prettytable 3.9.0 pypi_0 pypi prometheus_client 0.19.0 pyhd8ed1ab_0 conda-forge prompt-toolkit 3.0.41 pyha770c72_0 conda-forge proto-plus 1.23.0 pypi_0 pypi protobuf 3.20.3 pypi_0 pypi psutil 5.9.3 pypi_0 pypi ptyprocess 0.7.0 pyhd3deb0d_0 conda-forge pure_eval 0.2.2 pyhd8ed1ab_0 conda-forge py-spy 0.3.14 pypi_0 pypi pyarrow 14.0.1 pypi_0 pypi pyasn1 0.5.1 pypi_0 pypi pyasn1-modules 0.3.0 pypi_0 pypi pybind11-abi 4 hd8ed1ab_3 conda-forge pycosat 0.6.6 py310h2372a71_0 conda-forge pycparser 2.21 pyhd8ed1ab_0 conda-forge pydantic 1.10.13 pypi_0 pypi pygments 2.17.2 pyhd8ed1ab_0 conda-forge pyjwt 2.8.0 pypi_0 pypi pyparsing 3.1.1 pypi_0 pypi pysocks 1.7.1 pyha2e5f31_6 conda-forge python 3.10.13 hd12c33a_0_cpython conda-forge python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge python-dotenv 1.0.0 pypi_0 pypi python-fastjsonschema 2.19.0 pyhd8ed1ab_0 conda-forge python-json-logger 2.0.7 pyhd8ed1ab_0 conda-forge python_abi 3.10 4_cp310 conda-forge pytz 2023.3.post1 pyhd8ed1ab_0 conda-forge pywavelets 1.5.0 pypi_0 pypi pyyaml 6.0.1 py310h2372a71_1 conda-forge pyzmq 24.0.1 pypi_0 pypi ray 2.8.1 pypi_0 pypi ray-cpp 2.8.1 pypi_0 pypi readline 8.2 h8228510_1 conda-forge referencing 0.32.0 pyhd8ed1ab_0 conda-forge reproc 14.2.4.post0 hd590300_1 conda-forge reproc-cpp 14.2.4.post0 h59595ed_1 conda-forge requests 2.31.0 pyhd8ed1ab_0 conda-forge requests-oauthlib 1.3.1 pypi_0 pypi requests-toolbelt 0.10.1 pypi_0 pypi retrying 1.3.4 pypi_0 pypi rfc3339-validator 0.1.4 pyhd8ed1ab_0 conda-forge rfc3986-validator 0.1.1 pyh9f0ad1d_0 conda-forge rich 13.7.0 pypi_0 pypi rpds-py 0.13.2 py310hcb5633a_0 conda-forge ruamel.yaml 0.18.5 py310h2372a71_0 conda-forge ruamel.yaml.clib 0.2.7 py310h2372a71_2 conda-forge scikit-image 0.22.0 pypi_0 pypi scikit-learn 1.3.2 pypi_0 pypi scipy 1.11.4 pypi_0 pypi seaborn 0.12.2 pypi_0 pypi secretstorage 3.3.3 pypi_0 pypi send2trash 1.8.2 pyh41d4057_0 conda-forge setuptools 68.2.2 pyhd8ed1ab_0 conda-forge shapely 2.0.2 pypi_0 pypi simpervisor 1.0.0 pypi_0 pypi six 1.16.0 pyh6c4a22f_0 conda-forge smart-open 6.4.0 pypi_0 pypi smmap 5.0.1 pypi_0 pypi sniffio 1.3.0 pyhd8ed1ab_0 conda-forge soupsieve 2.5 pyhd8ed1ab_1 conda-forge sqlalchemy 2.0.23 pypi_0 pypi sqlparse 0.4.4 pypi_0 pypi stack-data 0.6.3 pypi_0 pypi stack_data 0.6.2 pyhd8ed1ab_0 conda-forge starlette 0.27.0 pypi_0 pypi statsmodels 0.14.0 pypi_0 pypi tabulate 0.9.0 pypi_0 pypi tangled-up-in-unicode 0.2.0 pypi_0 pypi tenacity 8.2.3 pypi_0 pypi tensorboardx 2.6.2.2 pypi_0 pypi terminado 0.18.0 pyh0d859eb_0 conda-forge threadpoolctl 3.2.0 pypi_0 pypi tifffile 2023.12.9 pypi_0 pypi tinycss2 1.2.1 pyhd8ed1ab_0 conda-forge tk 8.6.13 noxft_h4845f30_101 conda-forge toml 0.10.2 pypi_0 pypi tomli 2.0.1 pyhd8ed1ab_0 conda-forge tornado 6.3.3 py310h2372a71_1 conda-forge tqdm 4.66.1 pyhd8ed1ab_0 conda-forge traitlets 5.14.0 pyhd8ed1ab_0 conda-forge truststore 0.8.0 pyhd8ed1ab_0 conda-forge typeguard 4.1.5 pypi_0 pypi typer 0.9.0 pypi_0 pypi types-python-dateutil 2.8.19.14 pyhd8ed1ab_0 conda-forge typing-extensions 4.8.0 hd8ed1ab_0 conda-forge typing_extensions 4.8.0 pyha770c72_0 conda-forge typing_utils 0.1.0 pyhd8ed1ab_0 conda-forge tzdata 2023.3 pypi_0 pypi uri-template 1.3.0 pyhd8ed1ab_0 conda-forge uritemplate 3.0.1 pypi_0 pypi urllib3 1.26.18 pypi_0 pypi uvicorn 0.24.0.post1 pypi_0 pypi uvloop 0.19.0 pypi_0 pypi virtualenv 20.21.0 pypi_0 pypi visions 0.7.5 pypi_0 pypi watchfiles 0.21.0 pypi_0 pypi wcwidth 0.2.12 pyhd8ed1ab_0 conda-forge webcolors 1.13 pyhd8ed1ab_0 conda-forge webencodings 0.5.1 pyhd8ed1ab_2 conda-forge websocket-client 1.7.0 pyhd8ed1ab_0 conda-forge websockets 12.0 pypi_0 pypi wheel 0.42.0 pyhd8ed1ab_0 conda-forge widgetsnbextension 4.0.9 pypi_0 pypi wordcloud 1.9.3 pypi_0 pypi wrapt 1.16.0 pypi_0 pypi xz 5.2.6 h166bdaf_0 conda-forge y-py 0.6.2 pypi_0 pypi yaml 0.2.5 h7f98852_2 conda-forge yaml-cpp 0.8.0 h59595ed_0 conda-forge yarl 1.9.4 pypi_0 pypi ydata-profiling 4.6.0 pypi_0 pypi ypy-websocket 0.8.4 pypi_0 pypi zeromq 4.3.5 h59595ed_0 conda-forge zipp 3.17.0 pyhd8ed1ab_0 conda-forge zlib 1.2.13 hd590300_5 conda-forge zstandard 0.22.0 py310h1275a96_0 conda-forge zstd 1.5.5 hfc55251_0 conda-forge ```

Vortexx2 commented 6 months ago

This PR should fix the above issue.

vyasr commented 6 months ago

Thanks for the report and the fix!

rapidsai / cudf

[BUG] `str.character_ngrams` produces <NA> with strings < ngram length #14684