scikit-learn-contrib / hiclass

A python library for hierarchical classification compatible with scikit-learn
BSD 3-Clause "New" or "Revised" License
114 stars 20 forks source link

Question: BERT implementation #67

Closed friederikelbauer closed 1 year ago

friederikelbauer commented 2 years ago

Hey everyone!

I am using HiClass in my master thesis and I am comparing different classifiers (flat vs. hierarchical). I am trying to use BERT (pre-trained uncased) with the transformers model. (using this tutorial: https://towardsdatascience.com/text-classification-with-bert-in-pytorch-887965e5820f). Is there a way to implement HiClass with BERT?

As it has a different structure than others from sklearn (for example no inherent I am unsure how to combine the two packages. Any leads would be helpful! Also other variations of BERT (RoBERTa etc.)

Thankful for any ideas!

mirand863 commented 1 year ago

Hello @friederikelbauer,

This is a really interesting use case!

I actually discussed something similar with another PhD student during a talk I gave about HiClass a couple of months ago. I believe it is feasible and I found this library that could possibly help you https://github.com/charles9n/bert-sklearn. Please, let me know if this helps in your use case and feel free to contact me at fabio.malchermiranda@hpi.de if you want to discuss further.

mirand863 commented 1 year ago

Hi @friederikelbauer,

Just to keep you updated, I made some progress and now there is support for BERT available for all 3 local hierarchical classifiers, i.e., the local classifier per node, per parent node and per level. Also, now you can install the beta version without cloning the repository with pip install hiclass==4.2.8b1 pip install hiclass==4.2.8b2

However, the documentation example was not possible yet since bert-sklearn is not available in the python package index. I opened a new issue in their repository and will try to add it myself next week if I get no response from the developers.

I managed to build the documentation using their repository as source :) You can check the example here: https://hiclass--74.org.readthedocs.build/en/74/auto_examples/plot_bert.html.

I also tried the local classifiers per node and per level and it seems bert-sklearn does not have the attribute classes_, so it is not possible to use these classifier with bert at the moment. Perhaps the developers of bert-sklearn would be willing to add this if you ask them.

Please, let me know if you run into any issues with the beta version. Otherwise, please also let me know if it works for you so I can merge the PR and close the issue.

sciencecw commented 1 year ago

Hi,

I wonder how this adaptation deals with memory usage. It seems that a deep-copy is done for every layer/ node. BERT barely fits in most computing instances. It seems that this would crash any kernel pretty quickly, not to mention the need to train the model. That being said, I have not got around to test this out yet

mirand863 commented 1 year ago

Hi @sciencecw,

You are absolutely right, it does make a deep copy of the model. However, we need to finetune the bert model for each node/level/parent node individually, otherwise it would become a flat classifier. Friederike tried this and I think she ran it successfully, though it did not yield great results for her data.

sciencecw commented 1 year ago

we need to finetune the bert model for each node/level/parent node individually, otherwise it would become a flat classifier.

Is it possible to have multiple training heads sharing the same body? We don't necessarily need to freeze body weights, just train all classifier simultaneously. Or maybe that is hard to implement given the architecture, given that you rely on sklearn-BERT