Hashformers currently uses two backend systems, namely lm-scorer for GPT-2 and mlm-scoring for BERT. However, these packages have not been updated in the last 3 years, resulting in compatibility issues and performance limitations. For instance, our reranker based on mlm-scoring is now incompatible with Google Colab.
Proposed Solution
I propose we switch both back-end systems to minicons. This change offers several benefits that can help improve the functionality and performance of Hashformers:
Updated Software: minicons is an actively developed software, with the last commit made less than a week ago.
Model Flexibility: By transitioning to minicons, we are not limited to using just GPT-2 or BERT. Instead, we can use any Transformer model we prefer. This flexibility could potentially create a new SOTA for hashtag segmentation tasks, as we would have access to more powerful models.
Reduced Compatibility Issues: As minicons is built on the latest version of the transformers library, we can expect fewer compatibility problems.
Improved Algorithm: The existing algorithm behind mlm-scoring is possibly outdated, as indicated by the development of better-mlm-scoring. This improved scoring method is expected to be integrated soon into minicons through a PR.
In light of these advantages, I believe that transitioning to minicons will significantly enhance the efficiency and effectiveness of our library.
Current Scenario
Hashformers currently uses two backend systems, namely lm-scorer for GPT-2 and mlm-scoring for BERT. However, these packages have not been updated in the last 3 years, resulting in compatibility issues and performance limitations. For instance, our reranker based on mlm-scoring is now incompatible with Google Colab.
Proposed Solution
I propose we switch both back-end systems to minicons. This change offers several benefits that can help improve the functionality and performance of Hashformers:
In light of these advantages, I believe that transitioning to minicons will significantly enhance the efficiency and effectiveness of our library.