rowanz / grover

Code for Defending Against Neural Fake News, https://rowanzellers.com/grover/
Apache License 2.0
917 stars 222 forks source link

label in run_discrimination.py #5

Open elacsoft opened 5 years ago

elacsoft commented 5 years ago

After all, I am able to run run_discrimination.py but it is asking for 'label' key in item (input data) during prediction or inference, how am I suppose to provide the label as [human or machine] by myself. Please provide any README on how to use run_discrimination.py for predictions.

elacsoft commented 5 years ago

Also, the base and large model checkpoints are for discriminator or not ? because it is now giving variable error when using estimator.predict

rowanz commented 5 years ago

sorry about that, the discrimination side isn't as well documented right now. You can fetch the dataset I used for discrimination (using the generations from Grover-Mega) here: https://github.com/rowanz/grover/tree/master/generation_examples

Right now, I haven't yet provided base and large checkpoints for discrimination. That's because studying discrimination is very adversary-specific. Essentially what you need to do for any discrimination task is generate a lot of articles for a specific adversary, and then split those up as training/validation/testing, train the discriminator on the training set, and apply it on validation/test.

I've found that Grover can do remarkably well in a zero-shot setting (indeed it even gets >96% on GPT2-generated news articles in a zero-shot setting ). It also improves using weak-supervision (see our paper). However, my concern was that people would misinterpret results if I just uploaded checkpoints for discrimination, which is why this is a bit sparser in terms of documentation, sorry!

ghost commented 5 years ago

Hi @rowanz : The checkpoints available at https://storage.googleapis.com/grover-models/ are not for discriminator? Are these supposed to be used as base checkpoint and run our custom training on top of these?

rowanz commented 5 years ago

hi @SandeepBhutani,

Yep, you'll need to run discriminative training on top of those checkpoints.