Closed nashid closed 2 years ago
@AlanSwift @hugochan can anyone please help me with this query?
๐ Documentation
I am confused with the
EmbeddingConstruction
documentation.embedding_style: single_token_item: false emb_strategy: "w2v_bilstm"
If I understand correctly if there are multiple tokens in the node attribute i.e.:
graph_data.node_attributes[index]["token"] = "I am multiple tokens"
, in that case I should setsingle_token_item
asfalse
. Is this understanding correct?Secondly, why I can't set
seq_info_encode_strategy
astrue
for themulti-token
items?
@nashid Thank you for your attention to the library!
1) yes, you are correct about the first statement.
2) seq_info_encode_strategy
specifies strategies of encoding sequential information in raw text (i.e., each token in raw text as a graph node). If a graph node contains multiple tokens, we think in general it doesn't make sense to encode the raw sequential text to help initialize node embeddings. Can you give concrete cases where you think it makes sense to turn on seq_info_encode_strategy
for multi-token node?
@hugochan I am applying the Graph2Seq model to the source code (JAVA).
next-line
edge.I am wondering why not leave it to the user of the library? The users of the library can set seq_info_encode_strategy
according to their use case?
@hugochan I am applying the Graph2Seq model to the source code (JAVA).
- A single line of source code in a single node.
- Different lines are then connected with
next-line
edge.I am wondering why not leave it to the user of the library? The users of the library can set
seq_info_encode_strategy
according to their use case?
@nashid That's a good question! We want to keep a good balance between providing enough flexibility to users and providing off-the-shelf solutions which are proven to be effective in existing literature. For multi-token node, we think it's a common practice to run a sequence encoder on each node to initialize the node embeddings in many scenarios (e.g., IE graph, knowledge graph). Users are encouraged to build their customized embedding initialization strategy if the built-in options cannot suit their needs.
@hugochan if I use pre-trained embedding (letโs say word2vec
or gloVe
), I presume I can just feed the pre-trained embedding.
For multi-token node
, would that work? Can you please point me to an example, if you have one?
@hugochan if I use pre-trained embedding (letโs say
word2vec
orgloVe
), I presume I can just feed the pre-trained embedding.For
multi-token node
, would that work? Can you please point me to an example, if you have one?
@nashid yes, you can refer to this text summarization example which constructed an IE graph (containing multi-token nodes).
๐ Documentation
I am confused with the
EmbeddingConstruction
documentation.If I understand correctly if there are multiple tokens in the node attribute i.e.:
graph_data.node_attributes[index]["token"] = "I am multiple tokens"
, in that case I should setsingle_token_item
asfalse
. Is this understanding correct?Secondly, why I can't set
seq_info_encode_strategy
astrue
for themulti-token
items?