zihangdai / xlnet

XLNet: Generalized Autoregressive Pretraining for Language Understanding
Apache License 2.0
6.18k stars 1.18k forks source link

Shape of get_sequence_output() is wrong #126

Open astariul opened 5 years ago

astariul commented 5 years ago

I'm running this code (only relevant part) :

xlnet_config = xlnet.XLNetConfig(json_path=FLAGS.model_config_path)
  run_config = xlnet.create_run_config(is_training, True, FLAGS)

  print(inp_k)
  print(seg_id)
  print(inp_mask)
  print(mems)
  print(perm_mask)
  print(target_mapping)
  print(inp_q)

  xlnet_model = xlnet.XLNetModel(
      xlnet_config=xlnet_config,
      run_config=run_config,
      input_ids=inp_k,
      seg_ids=seg_id,
      input_mask=inp_mask,
      mems=mems,
      perm_mask=perm_mask,
      target_mapping=target_mapping,
      inp_q=inp_q)

  output = xlnet_model.get_sequence_output()
  print("THIS IS MEGADEBUG : {}".format(output))

and the output of this code is : (on TPU)

Tensor("transpose:0", shape=(640, 1), dtype=int32) # inp_k Tensor("transpose_2:0", shape=(640, 1), dtype=int32) # seg_id Tensor("transpose_3:0", shape=(640, 1), dtype=float32) # inp_mask None # mem Tensor("transpose_4:0", shape=(640, 640, 1), dtype=float32) # perm_mask Tensor("transpose_5:0", shape=(126, 640, 1), dtype=float32) # target_mapping Tensor("transpose_1:0", shape=(640, 1), dtype=float32) # inp_q

THIS IS MEGADEBUG : Tensor("model/transformer/dropout_3/dropout/mul_1:0", shape=(126, 1, 1024), dtype=float32)

According to the comments of the class XLNetModel, my input's shape are correct. But why my output has this shape ? According to comments of get_sequence_output(), the output shape should be [640, 1, 1024], not [126, 1, 1024]

Any guidance is welcome :)

lukemelas commented 5 years ago

This is happening because your target_mapping has size 126.

Looking at the docs for transformer_xl, we can see that target_mapping has the following description:

    ...
    target_mapping: float32 Tensor in shape [num_predict, len, bsz].
      If target_mapping[i, j, k] = 1, the i-th predict in batch k is
      on the j-th token.
      Only used during pretraining for partial prediction.
      Set to None during finetuning.
    ...

Since the size of target_mapping is (126, 640, 1), the size of the output is (126, 1, 1024).

astariul commented 5 years ago

Thanks for your input @lukemelas !

However, I still don't understand. From the comments :


Is the comment of get_sequence_output wrong ? The real output size should be [num_predict, bsz, d_model] and not [len, bsz, d_model] ?