mukhal / fairseq-tagging

a Fairseq fork for sequence tagging/labeling tasks
MIT License
31 stars 4 forks source link

What are the major changes that were made? #1

Open zeeshansayyed opened 4 years ago

zeeshansayyed commented 4 years ago

Hi @mohammadKhalifa

First of all, I want to thanks a lot for this awesome project. I have been meaning to get fairseq to do sequence tagging for the past few days. I was just trying to understand how I could do that, when I came across your project. I would like to contribute to the project in any way, if I can.

At a more general level, would you be able to explain what were the major changes that you made. I am trying to understand whether it would be possible to incorporate any downstream changes from fairseq or how hard would it be.

Specifically, can we implement simple models like BiLSTM-CRF using this? I saw that on the TO-DO list in the Read me, which means you have it in the pipeline. I could possible help you with that. Also, I was wondering whether we could make it such that we specify a generic encoder (transformer or LSTM) and whether or not we need a CRF layer on top of it to perform sequence tagging.

My personal aim is to also modify it to do multi task learning. I am hoping that the experience I gain doing this will help me with that as well.

Thanks

mukhal commented 4 years ago

@zeeshansayyed Thank you for your interest to contribute. what I did here is simply use the masked language modeling architectures (encoders) and tweaked them a bit for sequence tagging in the sequence_tagger module.

I believe you can start with a vanilla BiLSTM (CRF would probably be more work, but still possible), which would be a significant contribution to the repo. The way I see it is that you'll need to implement LSTM encoder-only architecture with a similar interface to SequenceTagger. You can create something like LSTMSequenceTagger with a fairseq.models.LSTMEncoder inside.

Good luck!

zeeshansayyed commented 4 years ago

Hi @mohammadKhalifa I opened an issue in the original fairseq repo asking about implementing the CRF tagger here. Do you have any thoughts on it? What I am mostly concerned about is having different implementations of forward while training and evaluation. If we do this, how will we have to change the generator code? Thanks