jscuds / rf-bert

MIT License
0 stars 0 forks source link

More settings for retrofitting and fine-tuning #19

Closed jxmorris12 closed 2 years ago

jxmorris12 commented 2 years ago
Screen Shot 2022-02-22 at 10 54 28 AM

This is Table 2 from Retrofitting Contextualized Word Embeddings with Paraphrases.

We're missing a couple of the row dimensions (new settings for retrofitting):

And most of the column dimensions, various fine-tuning tasks:

We certainly don't need to support every scenario, but more would be nice. They claim retrofitting with Sampled Quora gives a ~4% boost in accuracy on MPQA so that could be a good place to start.

jscuds commented 2 years ago

First Steps

  1. I'll start w/ MPQA. I'm going to use Version 3.0 (came out in 2015 so I think it's a reasonable assumption that this is the one Shi used). The site for the dataset is here [link], and I confirmed it's not on HF datasets via this GH Issue (which I saw you had some comments on from 18 months ago, haha).
  2. I think MRPC in datasets is structured similarly to quora/('glue','qqp'). The ParaphraseDatasetElmo framework should just require me to add a few additional lines to allow for MRPC in retrofitting.
  3. CR is "Customer Review Datasets." I found the authors' website and under the heading "Data Sets" they have a 5 product review and a 9 product review -- without specification by Shi I'll use the 5 product review.
  4. As I come across additional datasets that aren't included in HF's library, I'll put the raw zip files on my Google Drive in a datasets directory
jxmorris12 commented 2 years ago

I think the things we added are sufficient at this point