Regarding max_leaf hyperparameter for Amazon-670K dataset

yourh / AttentionXML

Implementation for "AttentionXML: Label Tree-based Attention-Aware Deep Model for High-Performance Extreme Multi-Label Text Classification"

245 stars 41 forks source link

Regarding max_leaf hyperparameter for Amazon-670K dataset #15

Closed pnjha closed 4 years ago

pnjha commented 4 years ago

Is there an assumption that max_leaf hyperparameter present in the configuration file will never be set to 1? Because when I try to run with max_leaf = 1 there is an assert statement in cluster.py which fails.

assert sum(len(labels) for labels in labels_list) == labels_f.shape[0]

This is present inside build_tree_by_level method. Also can you explain the above assert statement and why it is necessary?

yourh commented 4 years ago

Yes, it will never be set to 1 in our settings. If max_leaf=1, the tree is not a full tree and the leaf is not in the same level. Our code cannot deal with this situation. The assert statement means groups of this level includes all labels.

pnjha commented 4 years ago

So it is possible to extract attention score corresponding to each data point in the FastAttentionXML model with dimension #labels X 500? For example if I run for Amazon-670K dataset how can I extract a attention matrix of dimension 670K X 500 for each data point?
For the FastAttentionXML model in the last layer I noticed that there is no attention layer of dimension *#labels X hidden_size2 like what we have in the AttentionXML model (attention.attention.weight). Any reason for that given the fact that in levels above it had Network.attention.attention.weight** layer?
In the code there is flag parallel_attn can you throw some light about how it is used?

yourh commented 4 years ago

AttentonXML with PLT (FastAttentionXML) only caculated scores of a small part of labels, if you want to get all attention matrix, you should modify the code by yourself. It will be much slow because the large scale of labels set.
It's attn_weights from AttentionWeights.emb (split into multiple GPUs) or still in attention.attention.weight (not split into multiple GPUs)
It means whether to split the attention matrix into multiple GPUs.