shrgupta1 commented 2 years ago

Hi,

I'm using NAML with my own data instead of MIND dataset. I'm getting the following error:

candidate_news_vector = torch.stack( RuntimeError: stack expects a non-empty TensorList

yusanshi commented 2 years ago

Does the MIND dataset run well? Please check your data format. Make sure the format of your data is the same as MIND.

shrgupta1 commented 2 years ago

I didn't try the MIND dataset. I tried to keep the data format same as MIND but had to remove the subcategories, title entities and abstract entities because I didn't have that data.

shrgupta1 commented 2 years ago

Could the error be because I don't have GPU set up on my system? Upon googling the issue, I found the following solution: https://github.com/janvainer/speedyspeech/issues/18

but it wasn't very helpful since the usecase is different.

Thanks for helping! :)

yusanshi commented 2 years ago

Since stack expects a non-empty TensorList means the input to torch.stack is empty, you should check the data pipeline.

Have you modified this? https://github.com/yusanshi/news-recommendation/blob/master/src/config.py#L50

shrgupta1 commented 2 years ago

Yes, I did. Here's the edit:

`class NAMLConfig(BaseConfig): dataset_attributes = { "news": ['category', 'title', 'abstract'], "record": [] }

For CNN

num_filters = 300
window_size = 3`

yusanshi commented 2 years ago

I think:

Quick-and-dirty solution: make sure the data format is the same as MIND dataset by add dummy data (0s) for subcategories, entities, etc.
Or, carefullly check the dataset pipeline. You should do it yourself since I don't know the data details.

shrgupta1 commented 2 years ago

I checked it and the error was coming from the id column because the 'id' column didn't have datatype int so it was causing the error. I removed the 'id' column and that error is gone.

shrgupta1 commented 2 years ago

After removing the 'id' column, when I run train.py with python 3.9, I'm getting the following error:

RuntimeError: Caught RuntimeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/data/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop data = fetcher.fetch(index) File "/data/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch return self.collate_fn(data) File "/data/python3.9/site-packages/torch/utils/data/_utils/collate.py", line 74, in default_collate return {key: default_collate([d[key] for d in batch]) for key in elem} File "/data/python3.9/site-packages/torch/utils/data/_utils/collate.py", line 74, in <dictcomp> return {key: default_collate([d[key] for d in batch]) for key in elem} File "/data/python3.9/site-packages/torch/utils/data/_utils/collate.py", line 82, in default_collate raise RuntimeError('each element in list of batch should be of equal size') RuntimeError: each element in list of batch should be of equal size

yusanshi commented 2 years ago

Please first try the MIND dataset

yusanshi commented 2 years ago

Close inactive issue. Reopen it if needed.

yusanshi / news-recommendation

Runtime error #42

For CNN