Closed celsofranssa closed 1 year ago
I already have this file in models dir:
FastAttentionXML-Amazon-670k-Tree-0-Level-0
FastAttentionXML-Amazon-670k-Tree-0-Level-1
FastAttentionXML-Amazon-670k-Tree-0-cluster-Level-0.npy
FastAttentionXML-Amazon-670k-Tree-0-cluster-Level-1.npy
FastAttentionXML-Amazon-670k-Tree-0-cluster-Level-2.npy
FastAttentionXML-Amazon-670k-Tree-1-Level-0
FastAttentionXML-Amazon-670k-Tree-1-Level-1
FastAttentionXML-Amazon-670k-Tree-1-cluster-Level-0.npy
FastAttentionXML-Amazon-670k-Tree-1-cluster-Level-1.npy
FastAttentionXML-Amazon-670k-Tree-1-cluster-Level-2.npy
FastAttentionXML-Amazon-670k-Tree-2-Level-0
FastAttentionXML-Amazon-670k-Tree-2-Level-1
FastAttentionXML-Amazon-670k-Tree-2-cluster-Level-0.npy
FastAttentionXML-Amazon-670k-Tree-2-cluster-Level-1.npy
FastAttentionXML-Amazon-670k-Tree-2-cluster-Level-2.npy
Could you give me some directions?
It seems the cause is that line device_ids = list(range(1, torch.cuda.device_count()))
.
Since I have only one RTX 3090, it assigns an empty list to device_ids
(due range
starting from 1).
Is that the correct catch? If so, does this experiment only run on a dual GPU device?
Yes, the codes are for two or more GPUs. If you only have one GPU, you can change parallel_attn = labels_num <= most_labels_parallel_attn
to parallel_attn = True
in models.py
.
Thank you.
Hello,
After hours of training in Amazon-670k, I am getting the following error: