Closed friederikelbauer closed 1 year ago
Hello @friederikelbauer,
This is a really interesting use case!
I actually discussed something similar with another PhD student during a talk I gave about HiClass a couple of months ago. I believe it is feasible and I found this library that could possibly help you https://github.com/charles9n/bert-sklearn. Please, let me know if this helps in your use case and feel free to contact me at fabio.malchermiranda@hpi.de if you want to discuss further.
Hi @friederikelbauer,
Just to keep you updated, I made some progress and now there is support for BERT available for all 3 local hierarchical classifiers, i.e., the local classifier per node, per parent node and per level. Also, now you can install the beta version without cloning the repository with pip install hiclass==4.2.8b1
pip install hiclass==4.2.8b2
However, the documentation example was not possible yet since bert-sklearn is not available in the python package index. I opened a new issue in their repository and will try to add it myself next week if I get no response from the developers.
I managed to build the documentation using their repository as source :) You can check the example here: https://hiclass--74.org.readthedocs.build/en/74/auto_examples/plot_bert.html.
I also tried the local classifiers per node and per level and it seems bert-sklearn does not have the attribute classes_, so it is not possible to use these classifier with bert at the moment. Perhaps the developers of bert-sklearn would be willing to add this if you ask them.
Please, let me know if you run into any issues with the beta version. Otherwise, please also let me know if it works for you so I can merge the PR and close the issue.
Hi,
I wonder how this adaptation deals with memory usage. It seems that a deep-copy is done for every layer/ node. BERT barely fits in most computing instances. It seems that this would crash any kernel pretty quickly, not to mention the need to train the model. That being said, I have not got around to test this out yet
Hi @sciencecw,
You are absolutely right, it does make a deep copy of the model. However, we need to finetune the bert model for each node/level/parent node individually, otherwise it would become a flat classifier. Friederike tried this and I think she ran it successfully, though it did not yield great results for her data.
we need to finetune the bert model for each node/level/parent node individually, otherwise it would become a flat classifier.
Is it possible to have multiple training heads sharing the same body? We don't necessarily need to freeze body weights, just train all classifier simultaneously. Or maybe that is hard to implement given the architecture, given that you rely on sklearn-BERT
Hey everyone!
I am using HiClass in my master thesis and I am comparing different classifiers (flat vs. hierarchical). I am trying to use BERT (pre-trained uncased) with the transformers model. (using this tutorial: https://towardsdatascience.com/text-classification-with-bert-in-pytorch-887965e5820f). Is there a way to implement HiClass with BERT?
As it has a different structure than others from sklearn (for example no inherent I am unsure how to combine the two packages. Any leads would be helpful! Also other variations of BERT (RoBERTa etc.)
Thankful for any ideas!