some questions of NeuralCitationNetwork

Hearrywang commented 6 years ago

Hello, @tebesu I have read your paper "Neural Citation Network for Context-Aware Citation Recommendation (SIGIR 2017)" and I am pretty interested in your work.I am a student from Shanghai University of China.There are several questions I'd like to ask you.

Can you provide the test code of your model ？
If a context of paper A cites paper B(A->B). So the cluster_authors is the authors of paper B,right? I'd like to make sure it beacuse I see the code "example['cluster_authors'], # Encoder Context: where authors of cited paper".
I see the code “df = cPickle.load(open('title_context_df.pkl'))” ,but I couldn't find the "title_context_df.pkl".

Thank you!

tebesu commented 6 years ago

Can you provide the test code of your model ？

I used trec_eval for the evaluation: https://github.com/usnistgov/trec_eval

Use the loss function/likelihood function to score the candidate papers. The candidate papers are obtained with BM-25 I used solr to retrieve this.

If a context of paper A cites paper B(A->B). So the cluster_authors is the authors of paper B,right? I'd like to make sure it beacuse I see the code "example['cluster_authors'], # Encoder Context: where authors of cited paper".

We assume the encoder author's have a previous history or are represented by a unknown author token. So the its still the author of the encoder's paper.

I see the code “df = cPickle.load(open('title_context_df.pkl'))” ,but I couldn't find the "title_context_df.pkl".

Thank you for pointing that out here is the file: https://drive.google.com/file/d/1dVXMXlFUJ11KwMWDdI4Tel1j8U6Vrpc7/view?usp=sharing

Hope this helps. Thanks for your interest.

Hearrywang commented 6 years ago

Thank you for your answer.By your NCN model,I will reveive the title sequence but how to get more than one recommendation article title.

Hearrywang commented 6 years ago

I have trained this model. Can you show me how to use this model. I'm not familiar with tensorflow r0.11. Thanks a lot,a lot,a lot.

tebesu commented 6 years ago

The model returns the likelihood or the ranking score of the paper to recommend. Feed in the test data and run sess.run(model.score, feed) which will return a score for each paper. Then I wrote them to a file for trec_eval.

The code for testing is messy right now and requires refactoring to get it to run on a different environment.

Hearrywang commented 6 years ago

The score return a 64-dim list，not the score for each paper

tebesu commented 6 years ago

64 should be the batch size, where the score corresponds to each paper in the batch

janguck commented 5 years ago

The input to this paper is a cite text, a cite author, a cited author, the name of the cited paper title, and the result is the title of the cited paper. I understood that with the results I would use the BM-25 algorithm to get a list of recommended papers. Is that right?

tebesu commented 5 years ago

@janguck While the model itself is generative we use it to score the likelihood of the given citation context should cite a given paper by its title and authors. Since it would take too long to score every single context pair and paper with NCN we use BM25 to retrieve a subset of papers 2048 then rerank them with NCN.

Hope this helps.

janguck commented 5 years ago

Thank you for answer.

I think that it takes a long time to score, but I think that NCN model create a title, retrieve papers with BM25 and measure the performance with MRR and RECALL.

I don't know what you're talking about to rerank and retrieve for 2048 papers. I also wonder if the number is 2048.

tebesu commented 5 years ago

@janguck

Using the generated title and then retrieving using BM25 is an interesting idea. We never explored the generative aspect of the model.

The number 2048 is somewhat arbitrary. When scoring I used a batch size of 512 (4x = 2048) to score each citation/context pairs.

janguck commented 5 years ago

Thank you for answer.

So how did you know if the input was a citation?

janguck commented 5 years ago

Is the above 'sess.run(model.score, feed)' is loss when creating the title with seq2seq?

tebesu commented 5 years ago

@janguck model.score is to compute the likelihood a given paper should be cited given the citation context, authors and paper's title.

janguck commented 5 years ago

I don't know what you're talking about. In your paper, the model in the decoder is seq2seq. Each loss, score has a dimension of 64, is not it just a batch size? Also, is not the dimension of the logits 20002 the size of the vocabulary? Where does your recommendation? There is nothing you say. Is the title you wrote a strange thing?

tebesu commented 5 years ago

Please carefully formulate your questions and study the code/paper if you want an appropriate answer.

tebesu / NeuralCitationNetwork

some questions of NeuralCitationNetwork #2