rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.27k stars 535 forks source link

[BUG] HashingVectorizer docstring example is broken #5032

Closed dcolinmorgan closed 1 year ago

dcolinmorgan commented 2 years ago

Describe the bug feature_extraction HashingVectorizer demo does not work due to limited auto-coercions on L53. commenting out line 53 makes single column cudf and pandas dataframe accepted, not to mention native python df like in the example

Steps/Code to reproduce bug

from cuml.feature_extraction.text import HashingVectorizer
        corpus = [
            'This is the first document.',
            'This document is the second document.',
            'And this is the third one.',
            'Is this the first document?',
        ]
        vectorizer = HashingVectorizer(n_features=2**4)
        X = vectorizer.fit_transform(corpus)
        print(X.shape)

Expected behavior ValueError: cudf.Series([str]) expected ,got <class 'list'>

Environment details (please complete the following information):

beckernick commented 2 years ago

Thanks for flagging this documentation issue. We'll evaluate the feasibility of accepting lists. In the short term, we can update the documentation to fix this example.