Clarification on the pretrained spanbert models

weiweigu1998 commented 3 years ago

Hello Patrick,

(tagging @wgantt, who's also interested) We need some clarifications on the way of processing the spanbert model because we have got some abnormal scores testing your best model.

We first ran the download script from this repo: https://github.com/mandarjoshi90/coref , and ran the https://github.com/pitrack/incremental-coref/blob/emnlp2020/conversion_scripts/convert_tf_to_pt.sh to generate torch_scorer_vars.bin.

We downloaded the checkpoint from https://nlp.jhu.edu/incremental-coref/models/checkpoint.bin , and placed it under $log_dir/spanbert_large/spb_on_512/ following your instructions.

However, when we ran python inference.py spb_on_512, an error pops out saying that pytorch_models.bin is missing. We then downloaded the file of the same name from https://huggingface.co/SpanBERT/spanbert-large-cased/blob/main/pytorch_model.bin, and placed it under the directory of the pretrained encoder. As a result, we were able to run your models, but the scores were abnormally low and made no sense.

Please let me know if there is any step wrong causing the issue. Thanks!

pitrack commented 3 years ago

I tried to follow your steps:

I downloaded and generated torch_scorer_vars.bin from convert_tf_to_pt.sh. Running convert_tf_to_pt.sh is supposed to generate both pytorch_model.bin and torch_scorer_vars.bin, so you shouldn't need to download anything from the huggingface or facebook repo.

However, it doesn't seem to do that (anymore? I think the script used to dump whatever was converted into pytorch_model.bin, but now it errors and nothing saves). So, I also downloaded the version from the facebook repository and used that.

Since both torch_scorer_vars.bin and pytorch_model.bin are only used to initialize the model, this shouldn't have an effect on downstream performance. In both cases, (with the newly generated two files or with the original two I used in the experiments or mix and matching), I was able to get 79.7 on the dev set using the checkpoint from the nlp.jhu.edu site.

There were a few changes that should be made to the repo/documentation, e.g. len(sys.argv) > 1 should be > 2, and you also need config.json and vocab.txt next to torch_scorer_vars.bin and pytorch_model.bin. It looks like you already figured that out too. I'd be happy to make an update with all these changes after helping you reproduce the results.

I did not re-run the steps to preprocess the text itself, but if you're getting abnormally low scores, I doubt it's related to preprocessing. Instead, I suspect the code simply isn't loading the model (either due to path issues or something else)? What's in your logs after the dev set is loaded. For my run, I got

2021-08-29 23:34:26 [INFO] Loaded 343 examples.
2021-08-29 23:34:26 [INFO] Putting Encoder, GenreEmbedder, SpanScorer, and ScoringModule all on cuda
Some weights of BertModel were not initialized from the model checkpoint at /srv/local1/paxia/emnlp2020_icoref_check/encoder/spanbert_large and are newly initialized: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
2021-08-29 23:34:46 [INFO] Found old model at /srv/local1/paxia/emnlp2020_icoref_check/logs/spb_on_512/checkpoint.bin, loading instead
2021-08-29 23:34:46 [INFO] Old model not found or failed to load: Error(s) in loading state_dict for Incremental:
        Missing key(s) in state_dict: "encoder.model.embeddings.position_ids".
/srv/local1/paxia/anaconda3/envs/icoref/lib/python3.8/site-packages/torch/cuda/memory.py:231: FutureWarning: torch.cuda.reset_max_memory_allocated now calls torch.cuda.reset_peak_memory_stats, which resets /all/ peak memory stats.
  warnings.warn(

To be honest I'm not sure how this is loading the correct thing since it should be erroring (based on the "Old model not found" line). Could you try changing this line (inference.py#L117) to this?

missing, unexpected = incremental_model.load_state_dict(checkpoint["model"], strict=False)

When I changed it, now the log looks like this (score is the same)

2021-08-29 23:42:05 [INFO] Loaded 343 examples.
2021-08-29 23:42:05 [INFO] Putting Encoder, GenreEmbedder, SpanScorer, and ScoringModule all on cuda
Some weights of BertModel were not initialized from the model checkpoint at /srv/local1/paxia/emnlp2020_icoref_check/encoder/spanbert_large and are newly initialized: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
2021-08-29 23:42:23 [INFO] Found old model at /srv/local1/paxia/emnlp2020_icoref_check/logs/spb_on_512/checkpoint.bin, loading instead
['encoder.model.embeddings.position_ids'] []
/srv/local1/paxia/anaconda3/envs/icoref/lib/python3.8/site-packages/torch/cuda/memory.py:231: FutureWarning: torch.cuda.reset_max_memory_allocated now calls torch.cuda.reset_peak_memory_stats, which resets /all/ peak memory stats.
  warnings.warn(
2021-08-29 23:42:24 [INFO] loss count 0 and sampled loss count 0
46.0053: 343it [03:57,  1.45it/s]
2021-08-29 23:46:21 [INFO] cn: 58270 and wn: 1216
2021-08-29 23:46:21 [INFO] ca: 11765 and wa: 2510
2021-08-29 23:46:21 [INFO] Evaluation on 343 documents [96.857] took 237.0 seconds
2021-08-29 23:46:21 [INFO] __@ALL: 0.805, 0.789, 0.797, (343 docs)

pitrack commented 3 years ago

As a sidenote, this codebase was heavily modified and developed over the last year, leading to this EMNLP 2021 paper. Over the next few weeks, I will be cleaning up that code and making it public. It should contain a superset of the features from this repository although there are minor incompatibilities. For example, it doesn't depend torch_scorer_vars.bin or convert_tf_to_pt.sh, so it is more flexible for running on new datasets or with different encoders (as long as it's available on HuggingFace). If you want to use that repository instead and want it sooner than "in a few weeks," please send me an email.

(If your goal is to do additional modeling, I would recommend looking into the new codebase. If you just want to run inference with the pretrained model, then hopefully the instructions from the last comment unblock you).

weiweigu1998 commented 3 years ago

Thank you for your explanation. It just appeared that it fails to load the pretrained model earlier. I am able to generate the result of the paper following your instructions.

pitrack / incremental-coref

Clarification on the pretrained spanbert models #3