yao8839836 / text_gcn

Graph Convolutional Networks for Text Classification. AAAI 2019
1.35k stars 434 forks source link

Choice of LR vs linear SVC #25

Open Alekos92 opened 5 years ago

Alekos92 commented 5 years ago

I have read the paper and browsed the code, and I commend you for your work. Your graph-based approach is very interesting.

I have a question about your choice of model to combine with the bag of words approach. You decided to use Logistic Regression to learn from the TFIDF vectors. However, my first instinct is to always try linear support vector machines first, when I have to deal with sparse, highly dimensional data emerging from a text dataset. Indeed, making a simple change from LR to LinearSVC seems to improve performance for the bag of words model across the board. More specifically, we have:

20NG: 0.8319 to 0.8513 R8: 0.9374 to 0.9735 R52: 0.8695 to 0.9497 Ohsumed: 0.5466 to 0.6839 MR: 0.7459 to 0.7619

This means simply using SVC makes the bow model our best choice in the cases of R8, R52, and ohsumed, leaving behind all other more sophisticated approaches, including text gcn. The line of SVC event exists in bow.py, but is commented out in favor of LR.

Am I missing something here? Can you explain this seemingly odd choice of classifier?

yao8839836 commented 5 years ago

@Alekos92

Hi, thanks for your comments and questions.

I also observed these results in our experiments. LinearSVC (LIBLINEAR) is indeed a very powerful classifier for high dimensional sparse features like BOW. I choose LR because it is more similar to the softmax cross entropy function used in Text GCN, both LR and softmax cross entropy use softmax/logistic function to compute label probabilities given feature vectors. The loss of LR can also be softmax cross entropy loss:

https://peterroelants.github.io/posts/cross-entropy-logistic/

Another reason is that some other baseline methods (e,g., PTE, PV-DBOW and PV-DM) used LR as the classifier. In order to compare different features/embeddings fairly, we report TFIDF + LR results.

Alekos92 commented 5 years ago

Thank you for your speedy and informative response.

I understand now why you would choose LR, but still, I think BOW + LinearSVC should be considered as a baseline, in addition to BOW + LR, given that it is the best model to use for some datasets, in terms of both space/time efficiency and accuracy.

yao8839836 commented 5 years ago

@Alekos92

Thanks, we will include it in our future work.