Open farwashah6 opened 1 month ago
Thanks for surfacing this issue. Could you share a minimal reproducible example that includes the full error?
This could be due the same underlying issue as https://github.com/rapidsai/cuml/issues/5160 @dantegd @quasiben
This is a sample code:
` import textattack as ta import cuml import sklearn as sk import pandas as pd from textattack.models.wrappers import ModelWrapper
def load_data():
df_fake = pd.read_csv(f'datasets/isot/isot_Fake.csv')
df_fake['label'] = 0
df_true = pd.read_csv(f'datasets/isot/isot_True.csv')
df_true['label']** = 1
df = pd.concat([df_true, df_fake], ignore_index=True)
x = df['text'].copy()
y = df['label']
train_samples, test_samples, train_labels, test_labels = sk.model_selection.train_test_split(x, y, test_size=0.5, random_state=42)
return train_samples, test_samples, train_labels, test_labels, df
def vectorization(x_train, x_test):
vectorizer = cuml.feature_extraction.text.CountVectorizer()
train_vect = vectorizer.fit_transform(pd.Series(x_train))
test_vect = vectorizer.transform(pd.Series(x_test))
return train_vect, test_vect, vectorizer
def model(x_train_vect, x_test_vect, y_train, y_test):
classifiers = cuml.neighbors.KNeighborsClassifier()
classifiers.fit(x_train_vect, y_train)
accuracy = classifiers.score(x_test_vect, y_test)
print(f'Accuracy: {accuracy}')
return classifiers
class CuMLKNNWrapper(ModelWrapper):
def __init__(self, model, vectorizer):
self.model = model
self.vectorizer = vectorizer
def __call__(self, text_input, batch=None):
x_transform = self.vectorizer.transform(pd.Series(text_input)).astype(float)
prediction = self.model.predict_proba(x_transform)
return prediction
def attack(cuml_model, df, cuml_vectorizer):
custom_model_wrapper = CuMLKNNWrapper(cuml_model, cuml_vectorizer)
recipe = ta.attack_recipes.TextFoolerJin2019.build(model_wrapper=custom_model_wrapper)
data = [(row['text'], row['label']) for _, row in df.iterrows()]
attack_args = ta.attack_args.AttackArgs(num_examples=20, parallel=True, num_workers_per_device=2, disable_stdout=True)
dataset = ta.datasets.Dataset(data, input_columns=['text'])
attacker = ta.Attacker(recipe, dataset, attack_args)
attacker.attack_dataset()
if __name__ == '__main__':
train_examples, test_examples, y_train, y_test, data_samples = load_data()
x_train_tokens, x_test_tokens, vectorizer = vectorization(x_train=train_examples, x_test=test_examples)
classifier = model(x_train_tokens, x_test_tokens, y_train, y_test)
attack(classifier, data_samples, vectorizer)`
Error: Traceback (most recent call last): File "/home/farwa/vscode/venv/lib/python3.10/site-packages/textattack/attacker.py", line 591, in attack_from_queue result = attack.attack(example, ground_truth_output) File "/home/farwa/vscode/venv/lib/python3.10/site-packages/textattack/attack.py", line 444, in attack goal_functionresult, = self.goal_function.init_attack_example( File "/home/farwa/vscode/venv/lib/python3.10/site-packages/textattack/goal_functions/goal_function.py", line 67, in init_attackexample result, = self.get_result(attacked_text, check_skip=True) File "/home/farwa/vscode/venv/lib/python3.10/site-packages/textattack/goal_functions/goal_function.py", line 78, in get_result results, search_over = self.get_results([attacked_text], **kwargs) File "/home/farwa/vscode/venv/lib/python3.10/site-packages/textattack/goal_functions/goal_function.py", line 95, in get_results model_outputs = self._call_model(attacked_text_list) File "/home/farwa/vscode/venv/lib/python3.10/site-packages/textattack/goal_functions/goal_function.py", line 218, in _call_model outputs = self._call_model_uncached(uncached_list) File "/home/farwa/vscode/venv/lib/python3.10/site-packages/textattack/goal_functions/goal_function.py", line 193, in _call_model_uncached return self._process_model_outputs(attacked_text_list, outputs) File "/home/farwa/vscode/venv/lib/python3.10/site-packages/textattack/goal_functions/classification/classification_goal_function.py", line 25, in _process_model_outputs scores = torch.tensor(scores) File "cupy/_core/core.pyx", line 1496, in cupy._core.core._ndarray_base.len TypeError: len() of unsized object
I think @beckernick is correct, this is probably the same as #5160. We are planning to work on improvements and fixes for encoders and vectorizers very soon, including CountVectorizer
so aiming to have a solution for this in a nightly version in the next few weeks.
I think @beckernick is correct, this is probably the same as #5160. We are planning to work on improvements and fixes for encoders and vectorizers very soon, including
CountVectorizer
so aiming to have a solution for this in a nightly version in the next few weeks.
Thank you. Looking forward for the updates.
Hi. I am new to using GPU. I am working on adversarial machine learning and earlier I have used the Textattack library for one of my projects using Sklearn and Keras models. For that I created the customModelWrappers according to my models and they worked fine.
Now since my data is different and very big, I want to implement it using GPU for the same (sklearn) models, so I have to use CUML instead. But when I use CUML, and pass the cuml model to the CustomModelWrapper I created earlier, it gives me the following error
len() of unsized object
and then stops the execution.Additional Info: For vectorisation of my data I am using CountVectorizer of cuml, which is the cause of this error. Instead when I use CountVectorizer of sklearn it does the attack but doesn't use much GPU resources (of course). If anyone has the same experience, please help me in this.
I am attaching important chunks of my code here.