A collection of RAPIDS examples for security analysts, data scientists, and engineers to quickly get started applying RAPIDS and GPU acceleration to real-world cybersecurity use cases.
Apache License 2.0
168
stars
68
forks
source link
[BUG] DGA_Detection.py throws empty.memory_format CUDA runtime error in CUDA 11.2 #439
Describe the bug
When you run DGA detection when using CUDA 11.2, it throws this CUDA Runtime Error:
RuntimeError: Could not run 'aten::empty.memory_format' with arguments from the 'CUDA' backend. 'aten::empty.memory_format' is only available for these backends: [CPU, MkldnnCPU, SparseCPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].
import os
import cudf
import clx
import torch
import numpy as np
from datetime import datetime
from sklearn.metrics import accuracy_score, average_precision_score
from clx.analytics.dga_detector import DGADetector
from clx.utils.data.dataloader import DataLoader
from clx.analytics.dga_dataset import DGADataset
from cuml import train_test_split
#download data
!wget https://github.com/chrmor/DGA_domains_dataset/raw/master/dga_domains_full.csv
INPUT_CSV2 = "dga_domains_full.csv"
gdf2 = cudf.read_csv(INPUT_CSV2, header=None)
gdf2.columns=['type','bot','domain']
print(gdf2)
# convert from string to int64
gdf2["type"]=gdf2["type"].replace(to_replace='dga', value='0')
gdf2["type"]=gdf2["type"].replace(to_replace='legit', value='1')
gdf2["type"]=gdf2["type"].astype("int64")
# Let's start training
train_data = gdf2['domain']
labels = gdf2['type']
## Set params
LR = 0.001
N_LAYERS = 3
CHAR_VOCAB = 128
HIDDEN_SIZE = 100
N_DOMAIN_TYPE = 2
EPOCHS = 2 #for a fast test
TRAIN_SIZE = 0.7
BATCH_SIZE = 10000
MODELS_DIR = 'models'
## Start training. It will fail when you train the model
dd = DGADetector(lr=LR)
dd.init_model(n_layers=N_LAYERS, char_vocab=CHAR_VOCAB, hidden_size=HIDDEN_SIZE, n_domain_type=N_DOMAIN_TYPE)
dd.train_model(train_data, labels, batch_size=BATCH_SIZE, epochs=EPOCHS, train_size=0.7)
The output error is:
Epoch: 0%| | 0/2 [00:00<?, ?it/s]
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-9-59f318100bd3> in <module>
2 dd = DGADetector(lr=LR)
3 dd.init_model(n_layers=N_LAYERS, char_vocab=CHAR_VOCAB, hidden_size=HIDDEN_SIZE, n_domain_type=N_DOMAIN_TYPE)
----> 4 dd.train_model(train_data, labels, batch_size=BATCH_SIZE, epochs=EPOCHS, train_size=0.7)
/opt/conda/envs/rapids/lib/python3.8/site-packages/clx/analytics/dga_detector.py in train_model(self, train_data, labels, batch_size, epochs, train_size)
105 types_tensor = self._create_types_tensor(df["type"])
106 df = df.drop(["type", "domain"], axis=1)
--> 107 input, seq_lengths = self._create_variables(df)
108 model_result = self.model(input, seq_lengths)
109 loss = self._get_loss(model_result, types_tensor)
/opt/conda/envs/rapids/lib/python3.8/site-packages/clx/analytics/dga_detector.py in _create_variables(self, df)
172 df = df.drop("len", axis=1)
173 seq_len_tensor = torch.LongTensor(seq_len_arr)
--> 174 seq_tensor = self._df2tensor(df)
175 # Return variables
176 # DataParallel requires everything to be a Variable
/opt/conda/envs/rapids/lib/python3.8/site-packages/clx/analytics/dga_detector.py in _df2tensor(self, ascii_df)
185 """
186 dlpack_ascii_tensor = ascii_df.to_dlpack()
--> 187 seq_tensor = from_dlpack(dlpack_ascii_tensor).long()
188 return seq_tensor
189
RuntimeError: Could not run 'aten::empty.memory_format' with arguments from the 'CUDA' backend. 'aten::empty.memory_format' is only available for these backends: [CPU, MkldnnCPU, SparseCPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].
CPU: registered at /tmp/pip-req-build-ye6jlr3g/build/aten/src/ATen/CPUType.cpp:2127 [kernel]
MkldnnCPU: registered at /tmp/pip-req-build-ye6jlr3g/build/aten/src/ATen/MkldnnCPUType.cpp:144 [kernel]
SparseCPU: registered at /tmp/pip-req-build-ye6jlr3g/build/aten/src/ATen/SparseCPUType.cpp:239 [kernel]
BackendSelect: registered at /tmp/pip-req-build-ye6jlr3g/build/aten/src/ATen/BackendSelectRegister.cpp:761 [kernel]
Named: registered at /tmp/pip-req-build-ye6jlr3g/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
AutogradOther: registered at /tmp/pip-req-build-ye6jlr3g/torch/csrc/autograd/generated/VariableType_4.cpp:7586 [autograd kernel]
AutogradCPU: registered at /tmp/pip-req-build-ye6jlr3g/torch/csrc/autograd/generated/VariableType_4.cpp:7586 [autograd kernel]
AutogradCUDA: registered at /tmp/pip-req-build-ye6jlr3g/torch/csrc/autograd/generated/VariableType_4.cpp:7586 [autograd kernel]
AutogradXLA: registered at /tmp/pip-req-build-ye6jlr3g/torch/csrc/autograd/generated/VariableType_4.cpp:7586 [autograd kernel]
AutogradPrivateUse1: registered at /tmp/pip-req-build-ye6jlr3g/torch/csrc/autograd/generated/VariableType_4.cpp:7586 [autograd kernel]
AutogradPrivateUse2: registered at /tmp/pip-req-build-ye6jlr3g/torch/csrc/autograd/generated/VariableType_4.cpp:7586 [autograd kernel]
AutogradPrivateUse3: registered at /tmp/pip-req-build-ye6jlr3g/torch/csrc/autograd/generated/VariableType_4.cpp:7586 [autograd kernel]
Tracer: registered at /tmp/pip-req-build-ye6jlr3g/torch/csrc/autograd/generated/TraceType_4.cpp:9291 [kernel]
Autocast: fallthrough registered at /tmp/pip-req-build-ye6jlr3g/aten/src/ATen/autocast_mode.cpp:254 [backend fallback]
Batched: registered at /tmp/pip-req-build-ye6jlr3g/aten/src/ATen/BatchingRegistrations.cpp:511 [backend fallback]
VmapMode: fallthrough registered at /tmp/pip-req-build-ye6jlr3g/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
Expected behavior
In CUDA 11.0, it finishes the runs successfully
Environment overview (please complete the following information)
Used Runtime container with CLX 21.06 and Pytorch 1.7.1 installed. Ubuntu 18.04, Python 3.7
Environment details
Please run and paste the output of the /rapids/cudf/print_env.sh script here, to gather any other relevant environment details. The script is located in the docker container.
Describe the bug When you run DGA detection when using CUDA 11.2, it throws this CUDA Runtime Error:
Steps/Code to reproduce bug In https://github.com/rapidsai/clx/blob/branch-21.08/notebooks/dga_detection/DGA_Detection.ipynb, run the notebook. When you get to
dd.train_model(train_data, labels, batch_size=BATCH_SIZE, epochs=EPOCHS, train_size=0.7)
,OR use this min reproducible:
The output error is:
Expected behavior In CUDA 11.0, it finishes the runs successfully
Environment overview (please complete the following information) Used Runtime container with CLX 21.06 and Pytorch 1.7.1 installed. Ubuntu 18.04, Python 3.7
Environment details Please run and paste the output of the
/rapids/cudf/print_env.sh
script here, to gather any other relevant environment details. The script is located in the docker container.Additional context First brought to our attention by Taylor Perkins in RAPIDS GoAI. Thanks Taylor! @BartleyR