Regarding execution time on Amazon-670k dataset

yourh / AttentionXML

Implementation for "AttentionXML: Label Tree-based Attention-Aware Deep Model for High-Performance Extreme Multi-Label Text Classification"

245 stars 41 forks source link

Regarding execution time on Amazon-670k dataset #12

Closed pnjha closed 4 years ago

pnjha commented 4 years ago

I ran the code for the Amazon-670k dataset. I have not made any changes in the code or configuration files. But it's taking more than 10hrs to train and still the training is not complete.

My GPU details are given below

gpu_details

Can you confirm the amount of time it takes is fine or not.

yourh commented 4 years ago

It seems like there is something wrong with your PLT building and the training didn't start. You can see that no process used your GPUs. I just cloned the code and ran it on a 4 GPUs machine like yours, it works well like below:

pnjha commented 4 years ago

I debugged and realized that the code is getting stuck in deepxml/tree.py if level == 0: while not os.path.exists('{}-Level-{}.npy'.format(self.groups_path,level)): time.sleep(30) Probably because file "models/FastAttentionXML-Amazon-670K-Tree-0-cluster" is not created. Can you think about any reason for that?

yourh commented 4 years ago

I debugged and realized that the code is getting stuck in deepxml/tree.py if level == 0: while not os.path.exists('{}-Level-{}.npy'.format(self.groups_path,level)): time.sleep(30) Probably because file "models/FastAttentionXML-Amazon-670K-Tree-0-cluster" is not created. Can you think about any reason for that?

This file is created by deepxml/cluster.py. I checked and ran the code from github more than one time. It works well. Maybe you could check the code and data in your machine, debug your PLT buliding in deepxml/cluster.py and run it again.

pnjha commented 4 years ago

The entire problem was related to CPU resources allocated. It does run as expected when I increased CPU resources to 16 cores. Thanks for your help.