Closed SamsonYuBaiJian closed 3 years ago
Hi there @OceanskySun, I've read your recent paper PullNet, and I'm very curious about how training is done for subgraphs where the answer entities are <T distance away from the question entities on the shortest path(s) (T refers to the number of iterations for the subgraph expansion).
For some QA subgraphs, the maximum distance the question and the answer(s) entities on the shortest path(s) might be <T, if so, is the training terminated early?
And what about inference? Does it always go to T before termination?
Thank you!
Also, for the similarity defined as the dot-product of the last-state LSMT representation for the query with the embedding for the relation, it is mentioned that the relation embeddings are looked up from an embedding table.
Are these embeddings fixed and randomly initialised, then perhaps saved? If not, how are they initialised? Thank you!
Hi Samson,
PullNet will run retrieval for T steps. GRAFTNET will be executed on the retrieved graph. Graftnet always runs T steps of convolution. It should figure out itself what path it wants to take.
Relation embeddings is randomly initialized and trained with the model.
Thanks, Haitian
On Jan 25, 2021, at 5:03 AM, Samson Yu Bai Jian notifications@github.com wrote:
Also, for the similarity defined as the dot-product of the last-state LSMT representation for the query with the embedding for the relation, it is mentioned that the relation embeddings are looked up from an embedding table.
Are these embeddings fixed and randomly initialised, then perhaps saved? If not, how are they initialised? Thank you!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OceanskySun/GraftNet/issues/19#issuecomment-766700965, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADE5XL35NPJTV5TUALZZDPLS3U6W5ANCNFSM4WRPICGQ.
Thank you for getting back to me!
"PullNet will run retrieval for T steps." What happens when the maximum distance between the question and answer entities in an ideal question subgraph is <T during training? How is the _classifypullnodes classifier trained for iterations beyond the maximum distance in the subgraph, where there are no positive examples?
"Relation embeddings is randomly initialized and trained with the model." Are these relation embeddings the same as those used for GRAFT-Net during the classify functions, or are they separate?
Thanks, Samson
Hi @OceanskySun , to follow up from my previous response, I would like to ask:
1) "PullNet will run retrieval for T steps." What happens when the maximum distance between the question and answer entities in the ground truth question subgraph is <T during training? How is the classify_pullnodes classifier trained for iterations beyond the maximum distance in the subgraph, where there are no positive examples? Will the ground truth labels just be all 0?
2) "Relation embeddings is randomly initialized and trained with the model." Are the relation embeddings used for the LSTM the same as those used for GRAFT-Net during the classify functions, or are they separate?
3) How is the LSTM trained and what loss function is used? Do I first constrain the relations to relevant facts, then input these relations one by one into the dot product, then do BCE loss for each? Or do I input all relations, then do BCE loss for all?
4) Finally, when I get the relevant relations/facts to rank, lets say a certain relation is the top ranked, and there are 10 occurrences for that relation, but my limit of N_f=5, meaning I only choose to retrieve 5 facts, how do I choose between the 10? Is it random?
Thank you for your response!!
Hi there,
Thanks for your question.
Please let me know if you have more questions.
Thanks, Haitian
On Feb 24, 2021, at 6:23 AM, Samson Yu Bai Jian notifications@github.com wrote:
Hi @OceanskySun https://github.com/OceanskySun , to follow up from my previous response, I would like to ask:
"PullNet will run retrieval for T steps." What happens when the maximum distance between the question and answer entities in the ground truth question subgraph is <T during training? How is the classify_pullnodes classifier trained for iterations beyond the maximum distance in the subgraph, where there are no positive examples? Will the ground truth labels just be all 0?
"Relation embeddings is randomly initialized and trained with the model." Are the relation embeddings used for the LSTM the same as those used for GRAFT-Net during the classify functions, or are they separate?
How is the LSTM trained and what loss function is used? Do I first constrain the relations to relevant facts, then input these relations one by one into the dot product, then do BCE loss for each? Or do I input all relations, then do BCE loss for all?
Finally, when I get the relevant relations/facts to rank, lets say a certain relation is the top ranked, and there are 10 occurrences for that relation, but my limit if N_f=5, meaning I only choose to retrieve 5 facts, how do I choose between the 10? Is it random?
Thank you for your response!!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OceanskySun/GraftNet/issues/19#issuecomment-785008315, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADE5XL5JFEGQ2BL2DNCM3XTTATOTNANCNFSM4WRPICGQ.
Thank you for your excellent work and help!
Hi @OceanskySun , I have a few final questions for you, and would appreciate if you could help with these:
Thank you!
Hi there. Thanks for your questions. Please see my answers below.
Are the word embedding layers also the same for GRAFT-Net and the LSTM for the similarity function? Yes. You can also train different word embeddings separately. It shouldn’t matter too much.
In this case, since the max number of words/texts/entities to retrieve are not fixed by the size of the training dataset, how do you set them? Is it set by the user? You can treat them as hyper-parameters and tune them on the dev set.
Do you prioritise the ground-truth entities you retrieve for teacher forcing during training, e.g. add them first to the list of retrieved entities, or is it randomised? Yes, we do always add the ground-truth entities to the graph at training time.
How often do you train/backpropagate the loss for the two models? Is it for every t in T? Yes, for each iteration
For answer selection, it seems like only one answer entity node (node with greatest probability) is retrieved, but what about the cases in the dataset where there are multiple answers? We simply measure the @.*** of the dataset, so we only take the most confident entity as the answer. You may think of take top k instead. Another possible solution is to consider each candidate separately. For example, you can run sigmoid on the logits of the candidates, and then compute the binary cross entropy loss. As a side note, from our experiment, we find softmax is usually easier to optimize.
Thanks, Haitian
On Mar 15, 2021, at 7:08 AM, Samson Yu Bai Jian @.***> wrote:
Hi @OceanskySun https://github.com/OceanskySun , I have a few final questions for you, and would appreciate if you could help with these:
Are the word embedding layers also the same for GRAFT-Net and the LSTM for the similarity function? In this case, since the max number of words/texts/entities to retrieve are not fixed by the size of the training dataset, how do you set them? Is it set by the user? Do you prioritise the ground-truth entities you retrieve for teacher forcing during training, e.g. add them first to the list of retrieved entities, or is it randomised? How often do you train/backpropagate the loss for the two models? Is it for every t in T? For answer selection, it seems like only one answer entity node (node with greatest probability) is retrieved, but what about the cases in the dataset where there are multiple answers? Thank you!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OceanskySun/GraftNet/issues/19#issuecomment-799331874, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADE5XL6DWAHK7X7XKW2WHRLTDXTC7ANCNFSM4WRPICGQ.
Hi @OceanskySun , may I check how the softmax classification is done in your case, when you take the top-1 hits/precision, since there may be multiple answers? Thank you.
@.*** counts if the any of the correct answer is predicted.
On Mar 26, 2021, at 6:28 AM, Samson Yu Bai Jian @.***> wrote:
Hi @OceanskySun https://github.com/OceanskySun , may I check how the softmax classification is done in your case, when you take the top-1 hits/precision, since there may be multiple answers? Thank you.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OceanskySun/GraftNet/issues/19#issuecomment-808103067, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADE5XL5CN7EQZUIJMPXC7M3TFROTPANCNFSM4WRPICGQ.
Hi @OceanskySun, thank you for all your help so far, I realise that there are quite a few important decisions that you have made for training:
Thanks @OceanskySun for all the help so far, you have answered so many of my questions...
Hi,
Thanks for your question.
Thanks, Haitian
On Apr 6, 2021, at 6:16 AM, Samson Yu Bai Jian @.***> wrote:
Hi @OceanskySun https://github.com/OceanskySun, thank you for all your help so far, I realise that there are quite a few important decisions that you have made for training:
What is graph recall? Is it answer entities, answer + intermediate entities, or answer + intermediate entities + facts in ideal subgraph? How did you improve graph recall for testing, especially for documents-only runs? Which hyperparameters had the most influence? Did you include the titles of documents for PyLucene when calculating document similarity, and how did you do so (eg. different field)? What are the most impactful hyperparameters in your opinion? Is there a classification/recall trade-off? Like if your max local entities is set too high, the classification performance will start decreasing? Thanks @OceanskySun https://github.com/OceanskySun for all the help so far, you have answered so many of my questions...
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OceanskySun/GraftNet/issues/19#issuecomment-814004448, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADE5XL6G52ZJNBCQPUXXKWDTHLNPHANCNFSM4WRPICGQ.
The relation embeddings for r and the LSTM states are trained.
Just a note here that LSTM states and relations embeddings are also used in other places and will be trained there as well.
On May 6, 2021, at 9:06 AM, dinani65 @.***> wrote:
I am missing some information about similarity function as a classifier in the step of building subgraphs. "Similarity is defined as the dot-product of the last-state LSTM representation for q with the embedding for r. This dot-product is then passed through a sigmoid function to bring it into a range of [0,1]: as we explain below, we will train this similarity function as a classifier which predicts which retrieved facts are relevant to the question q." Based on the above part of the paper, the similarity score is calculated based of doc product of two tensors and applying a sigmoid function. I can not understand which part needs to train here. It seems to have a mathematical operation to classification. which part of function needs to be trained?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OceanskySun/GraftNet/issues/19#issuecomment-833506719, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADE5XL7P4C36H7BOTCHSU4LTMKH5VANCNFSM4WRPICGQ.
Hi Haitian Sun,
Thanks for the great work! I have some questions regarding the subgraph construction in your work PullNet (also the topic of this issue)
Thanks for your patience! I eagerly look forward to hearing from you.
Hi, thanks for your interest. The implementation of PullNet uses some google internal tools, so it is a bit hard to open source. Please let me know if you have any questions. I’m happy to help.
Originally posted by @OceanskySun in https://github.com/OceanskySun/GraftNet/issues/14#issuecomment-689216248