Closed danielbellhv closed 3 years ago
How did you solve this? I am getting the same error
@danielbellhv I get the same error too and none of the answers I could find on the web helped. Can you explain how you solved it? At least I would be interested to know what this <U14
means.
@B-Gendron Hey, sorry I closed it but never solved the error. I've since moved on to other projects, so I wouldn't be able to work out a solution
Maybe the error occurs because they all need to be the same type, not just arrays and tensors - but all arrays or all tensors
The problem is that I have the same issue even if all my data has the same type. Here is the code of my dataset class:
class SentenceEmotionDatasetBERT(Dataset):
def __init__(self, data, args):
self.args = args
self.data = data
def __len__(self):
if len(self.data) < 2000:
return 1000
else:
return len(self.data)
def __getitem__(self, idx):
item = {
"label" : np.array(self.data[idx]["label"]),
"dialog" : np.array(self.data[idx]["dialog"]),
"embedding" : np.array(self.data[idx]["embedding"]),
"encoding" : np.array(self.data[idx]["encoding"])
}
return item
Therefore, it is all numpy arrays but I still have the error:
TypeError: default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found <U64
Note that the last two digits change each time I execute the code, it is very strange I don't know what this means.
I think U64
or U14
etc. refers to numpy
's Unicode string of length 64
or 14
, so our data would be of type numpy.str_
or numpy.unicode_
, which PyTorch
's DataLoader()
can't handle.
There's a concept called Word Embeddings, which converts text into numeric values for DNN. Idk your level of proficiency, but would defo look into that further, as it will relate
How did you solve this? I am getting the same error
Sorry @ara7 for not seeing your comment. Did you ever resolve this for yourself? I don't normally leave posts one forums unanswered
Actually I work in NLP so I'm completely aware of Embeddings. The fact is that I wanted to keep my original dialog for further use in a qualitative analysis. But you are entirely right: the problem is located at the string variables. If I ignore my dialog key, it works perfectly! Thanks a lot!
Amazing, so quickly dealt with! I was far too in the trenches with NLP, but have since moved on to Computer Vision.
The real issue with these PyTorch posts is almost anything can solve the exact same error, and is so difficult to reproduce.
Glad I could help. Let me know if you've any other issue
I am new to using
PyTorch
, which comes with a lot of new errors that I haven't seen before.Error occurs on
trainer.fit(model, dm)
.# HERE
is commented where my function is called and where there is aPyTorch
error.Note: the
for loop
is for record training time 10 times over as an average. I want to document training and prediction times across different datasets.Invocation:
run_training()
:Error Traceback: