引用nezha模型

from transformers import NezhaModel, NezhaConfig

self.config = BertConfig.from_pretrained(config_path) self.bert_module = NezhaModel.from_pretrained(bert_dir, config=self.config) bert_outputs = self.bert_module(input_ids=x, attention_mask=mask, token_type_ids=segs, output_hidden_states =True)

bert_outputs结果中，多层结果是nan，不知道是什么原因。 BaseModelOutputWithPoolingAndCrossAttentions(last_hidden_state=tensor([[[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan]], device='cuda:0'), hidden_states=(tensor([[[ 0.5742, -0.2564, 0.4186, ..., 0.8307, -1.6965, 0.6848], [-0.6152, 0.1826, -1.1161, ..., 0.6985, -3.4405, 1.4675], [-0.2423, 0.8284, 0.5155, ..., 1.0843, -1.4233, 0.5122], ..., [-0.2828, -0.2603, -0.6676, ..., 0.5609, -2.0621, 0.5314],

     [ 0.5203,  0.3228, -0.4273,  ..., -0.2345, -0.1468, -0.2845],
     [ 0.5203,  0.3228, -0.4273,  ..., -0.2345, -0.1468, -0.2845],
     [ 0.5203,  0.3228, -0.4273,  ..., -0.2345, -0.1468, -0.2845]]],
   device='cuda:0'), tensor([[[nan, nan, nan,  ..., nan, nan, nan],
     [nan, nan, nan,  ..., nan, nan, nan],
     [nan, nan, nan,  ..., nan, nan, nan],
     ...,

     [nan, nan, nan,  ..., nan, nan, nan],
     [nan, nan, nan,  ..., nan, nan, nan]]], device='cuda:0'),), past_key_values=None, attentions=None, cross_attentions=None)

huawei-noah / Pretrained-Language-Model

使用nezha_base_www模型，得到的嵌入向量为nan #227

引用nezha模型