Open secsilm opened 11 months ago
Thanks for the detailed writeup. Are you able to make the model available? It will be easier to debug with a known example exhibiting the behavior.
Alternatively, if you can share your NER conversion code, that would also go a long way towards working it out. I take it you were using this Roberta model?
cahya/roberta-base-indonesian-1.5G
this issue claims those should not be relevant:
https://github.com/huggingface/transformers/issues/6193
i will have to dig into it a bit more after morning siesta
I wanted to tell you that you're crazy, but apparently not
>>> from stanza.models.common.bert_embedding import load_bert
>>> m, t = load_bert("cahya/roberta-base-indonesian-1.5G")
>>> m2, t2 = load_bert("cahya/roberta-base-indonesian-1.5G")
>>> for n, p in m.named_parameters():
... p2 = m2.get_parameter(n)
... if not torch.allclose(p, p2):
... print(n)
...
pooler.dense.weight
But then I'm not sure it has a noticeable effect:
>>> from stanza.models.common.bert_embedding import extract_bert_embeddings
>>> model_name = "cahya/roberta-base-indonesian-1.5G"
>>> r = extract_bert_embeddings(model_name, t, m, [["TPA", "Suwung", "Badung"]], m.device, True)[0]
>>> r2 = extract_bert_embeddings(model_name, t2, m2, [["TPA", "Suwung", "Badung"]], m.device, True)[0]
>>> torch.allclose(r, r2)
True
So I think that ultimately I need more to reproduce this, specifically either the model itself or a prescription for generating the dataset (the latter would be quite useful, actually, as it would let us add Indonesian NER to Stanza)
I wanted to tell you that you're crazy, but apparently not
>>> from stanza.models.common.bert_embedding import load_bert >>> m, t = load_bert("cahya/roberta-base-indonesian-1.5G") >>> m2, t2 = load_bert("cahya/roberta-base-indonesian-1.5G") >>> for n, p in m.named_parameters(): ... p2 = m2.get_parameter(n) ... if not torch.allclose(p, p2): ... print(n) ... pooler.dense.weight
But then I'm not sure it has a noticeable effect:
>>> from stanza.models.common.bert_embedding import extract_bert_embeddings >>> model_name = "cahya/roberta-base-indonesian-1.5G" >>> r = extract_bert_embeddings(model_name, t, m, [["TPA", "Suwung", "Badung"]], m.device, True)[0] >>> r2 = extract_bert_embeddings(model_name, t2, m2, [["TPA", "Suwung", "Badung"]], m.device, True)[0] >>> torch.allclose(r, r2) True
So I think that ultimately I need more to reproduce this, specifically either the model itself or a prescription for generating the dataset (the latter would be quite useful, actually, as it would let us add Indonesian NER to Stanza)
This is crazy. different weights but same embeddings? Is this due to floating-point precision issues? Or extract_bert_embeddings not using pooler layer?
For model, I have mailed you.
Alternatively, if you can share your NER conversion code, that would also go a long way towards working it out. I take it you were using this Roberta model? cahya/roberta-base-indonesian-1.5G
I didn't specify this, and I don't know where it came from.
Good question. If you're using that as the default, perhaps there is an older version of Stanza or something. The current version of stanza wants to use this as the default:
indolem/indobert-base-uncased
and you can specify that with
--bert_model indolem/indobert-base-uncased
You might have better luck with that transformer. I checked the parameters in that one, and none of them are different between instances of the model
On Fri, Oct 20, 2023 at 6:56 PM Alan Lee @.***> wrote:
Alternatively, if you can share your NER conversion code, that would also go a long way towards working it out. I take it you were using this Roberta model? cahya/roberta-base-indonesian-1.5G
I didn't specify this, and I don't know where it came from.
— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/stanza/issues/1301#issuecomment-1773600124, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA2AYWLA55NDKZRDMWDUSH3YAMTWRAVCNFSM6AAAAAA6ISG3SGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZTGYYDAMJSGQ . You are receiving this because you commented.Message ID: @.***>
Did it ever work to try with a different transformer?
Well, never heard back about using a different transformer, but I think it should help results compared to having the pooler layers randomly initialized. I filed an issue on HF to see if they'll be able to update the model with fixed layers. In the meantime, there are several other Indonesian transformer models available, and I suggest using one of those instead. (Alternatively, you could always fine tune this transformer, and then at least the random initialization will be fine tuned a bit.)
https://huggingface.co/cahya/roberta-base-indonesian-1.5G/discussions/2
Is this addressed?
The next version will include finetuning code and peft finetuning for several different annotators, so if you want to use the Cahya transformer instead of IndoBert, it will work as long as you do that finetuning.
Perhaps there could be some automatic finetuning / saving of these unpopulated tensors when training, even if finetuning is off, but that is a larger project and probably a rather thankless one considering there are multiple other solutions.
Describe the bug We recently trained a NER model using charlm, following the instructions here.
Then I noticed that every time I reload the model for prediction, the results are always different for the same input. Here is the code:
The different results between two runs:
(1, ['TPA Suwung'])
and(2, ['TPA Suwung', 'Badung'])
.Here is the log:
I have done some research, and here are my findings:
If you add the following lines at the beginning of the program, then the results will be the same every time.:
Once the model is loaded into memory, the predictions will be the same regardless of how many times you make predictions.
I saved the
nlp_id.processors['ner'].trainer.model.state_dict()
twice and found that the differing keys areword_emb.weight
andbert_model.pooler.dense.weight
.The code responsible for the warning message "Some weights of RobertaModel were not initialized from the model checkpoint at cahya/roberta-base-indonesian-1.5G and are newly initialized" can be found at: https://github.com/stanfordnlp/stanza/blob/c65b66969469fd29b02ba972830087e4007c6b54/stanza/models/common/bert_embedding.py#L52.
In conclusion, I speculate that during model loading, the weights of the pooler are not being loaded, causing them to be randomly initialized each time the model is loaded. Setting the random seed manually ensures consistent results. However, I am unable to further investigate and fix this issue. I need you guys help. Thanks!
To Reproduce Run the code above multitimes. Maybe you should replace the model.
Expected behavior Consistent results between runs.
Environment (please complete the following information):
Additional context Here are the state_dict keys for the three models:
id_sample_nertagger.pt
:id_test_forward_charlm.pt
andid_test_backward_charlm.pt
: