paperswithcode / galai

Model API for GALACTICA
Apache License 2.0
2.67k stars 275 forks source link

Academic paper summary example #39

Closed IlyaTyagin closed 1 year ago

IlyaTyagin commented 1 year ago

Could you please share an example of how to use this model to summarize academic papers given full text?

gboeer commented 1 year ago

Hi @IlyaTyagin, I was curious about this too, however I am not sure, if the model could give an appropriate summary of a paper it didn't see in training either directly or in form of a review written about said paper.

I made the model generate a summary for a paper it most probably had seen before (Xia, Rongkai, et al. "Supervised hashing for image retrieval via image representation learning."). And here is the result after the first try.

# Introduction 

 The main idea of the paper "Supervised hashing for image retrieval via image representation learning" is to learn compact binary codes for image representation in the task of image retrieval. 
 The compact binary codes can be efficiently stored and compared using Hamming distance. 
 The main advantage of hashing methods is that they can learn compact binary codes in an end-to-end manner. 
 The authors of the paper "Supervised hashing for image Retrieval via image representation learning" propose a novel deep supervised hashing method called Simultaneous Deep Hashing (SDH). 
 The SDH method consists of two networks: a deep feature learning network and a hashing network. 
 The feature learning network is a deep convolutional neural network (CNN) that is trained on the large-scale image dataset to learn discriminative image representations. 
 The hashing network is a multilayer perceptron (MLP) that learns compact binary codes from the image representations. 
 The loss function of the SDH method is a linear combination of two terms: a classification loss and a pairwise similarity loss. 
 The classification loss is used to learn discriminative image representations and the pairwise similarity loss is used to learn compact binary codes. 
 The authors of the paper "Supervised Hashing for Image Retrieval via Image Representation Learning" compare the performance of the SDH method with the state-of-the-art methods on the image retrieval task. 
 The authors also provide a detailed description of the SDH method and the experimental setup. 
 The paper is well written and the experiments are well explained. I recommend this paper for publication in the Journal of Machine Learning Research. 

### Minor comments

- Line 14: "A deep learning architecture is first pre-trained to learn discriminative image representations."

- Line 14: 

"The image representations are then fed into a multilayer perceptron (MLP) that learns binary codes."

- Line 20: "The loss function of the SDH method is a combination of a classification loss and a pairwise similarity loss."

- Line 21: "The pairwise similarity loss is used to learn compact binary code."

- Line 22: "The SDH method learns compact binary code by minimizing the pairwise similarity loss."

- Line23: "The SDH method learns compact image representations and compact binary code by minimizing the classification loss and the pairwise similarity loss, respectively."

- Line 26: "The classification loss is used to learn discriminative image representations."

### Major comments

- Line 1: "The paper 'Supervised hashing for image retrieval via image representation learning' by Yang et al."

- Line 2: "This paper focuses on the image retrieval task."

- Line 3: "In this paper, we propose a deep supervised hashing method called Simultaneous Deep Hashing."

- Line 4: "The SDH method consists of two networks: a feature learning network and a hashing network."

- Line 4: Please add the "and" in the sentence "The SDH method consists of two..."

- Line 4: Please remove the "and" in the sentence : "The SDH method consists of a feature learning network and a hashing network. "

- Line 5: "The feature learning network is a deep CNN that is trained on the large-scale image datasets to learn discriminative image representations."

 - Line 5: Please add the "and" in sentence "The feature learning network is a..."

- Line 6: "In this paper, we propose the SDH method."

- Line 6: Please add the "SDH" in the sentence "The SDH"

- Line 7: "The SDH method learns compact hash codes from the image representations by minimizing the pairwise similarity loss."

 - Line 7: Please add the "SDH"

- Line 8: "The SDH method consists of the feature learning network and the hashing network."

 - Line 8: Please add the "and" in "The SDH method consists of..."

- Line 9: "The loss function of the SDM method is a combination of a classification loss and the pairwise similarity loss."

 - Line 9: Please add the "SDH" and the "loss function" in the sentence "The loss function of the SDH"

- Line 10: "The SDH method consists of... "

 - Line 10: Please add the "and" in

- Line 11: "The SDH method learns compact bit codes by minimizing the pairwise similarity loss."

 - Line 11: Please add the "SDH"</s>

Now, without having read this specific paper in detail, I think the summary at the beginning does include some very relevant information (e.g. that binary codes are used, that they are compared by hamming distance, that CNNs are used as features extractors, all of which is part of the said publication).

What also is funny of course are those comments to the paper which seem to be coming from some kind of open review. Definitely some interesting results for a first shot.

Here is the complete code I've used to generate this:

from transformers import AutoTokenizer, OPTForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("facebook/galactica-6.7b")
tokenizer.pad_token_id = 1
tokenizer.padding_side = 'left'
tokenizer.model_max_length = 200

model = OPTForCausalLM.from_pretrained("facebook/galactica-6.7b", device_map="auto", torch_dtype=torch.float16)

input_text = '# Introduction \n\n The main idea of the paper "Supervised hashing for image retrieval via image representation learning" is'

input_ids = tokenizer(input_text, padding='max_length', return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids, max_new_tokens=2000,
                         do_sample=True,
                         temperature=0.7,
                         top_k=25,
                         top_p=0.9,
                         no_repeat_ngram_size=10,
                         early_stopping=True)

with open('galactica_2000.md', 'w') as result_file:
    result_file.write(tokenizer.decode(outputs[0]).lstrip('<pad>'))
cvinker commented 1 year ago
input_text = '''#Review Article.
#Title: A review of the paper: 
Darby, R., & Cozijn, J. (2013). The British Royal Family’s Circumcision Tradition: Genesis and Evolution of a Contemporary Legend.
# Introduction \n\n The main idea of the paper "The British Royal Family's Circumcision Tradition" is'''

Could also work maybe? I'll try it and report back.

RJT1990 commented 1 year ago

Hey, we included prompts such as "TLDR" and "Summarize the text above", so your best bet is to enter some text and then use these prompts. Alternatively ask a question about the full text above as a context.

cvinker commented 1 year ago

Hi @IlyaTyagin, I was curious about this too, however I am not sure, if the model could give an appropriate summary of a paper it didn't see in training either directly or in form of a review written about said paper.

I made the model generate a summary for a paper it most probably had seen before (Xia, Rongkai, et al. "Supervised hashing for image retrieval via image representation learning."). And here is the result after the first try.

# Introduction 

 The main idea of the paper "Supervised hashing for image retrieval via image representation learning" is to learn compact binary codes for image representation in the task of image retrieval. 
 The compact binary codes can be efficiently stored and compared using Hamming distance. 
 The main advantage of hashing methods is that they can learn compact binary codes in an end-to-end manner. 
 The authors of the paper "Supervised hashing for image Retrieval via image representation learning" propose a novel deep supervised hashing method called Simultaneous Deep Hashing (SDH). 
 The SDH method consists of two networks: a deep feature learning network and a hashing network. 
 The feature learning network is a deep convolutional neural network (CNN) that is trained on the large-scale image dataset to learn discriminative image representations. 
 The hashing network is a multilayer perceptron (MLP) that learns compact binary codes from the image representations. 
 The loss function of the SDH method is a linear combination of two terms: a classification loss and a pairwise similarity loss. 
 The classification loss is used to learn discriminative image representations and the pairwise similarity loss is used to learn compact binary codes. 
 The authors of the paper "Supervised Hashing for Image Retrieval via Image Representation Learning" compare the performance of the SDH method with the state-of-the-art methods on the image retrieval task. 
 The authors also provide a detailed description of the SDH method and the experimental setup. 
 The paper is well written and the experiments are well explained. I recommend this paper for publication in the Journal of Machine Learning Research. 

### Minor comments

- Line 14: "A deep learning architecture is first pre-trained to learn discriminative image representations."

- Line 14: 

"The image representations are then fed into a multilayer perceptron (MLP) that learns binary codes."

- Line 20: "The loss function of the SDH method is a combination of a classification loss and a pairwise similarity loss."

- Line 21: "The pairwise similarity loss is used to learn compact binary code."

- Line 22: "The SDH method learns compact binary code by minimizing the pairwise similarity loss."

- Line23: "The SDH method learns compact image representations and compact binary code by minimizing the classification loss and the pairwise similarity loss, respectively."

- Line 26: "The classification loss is used to learn discriminative image representations."

### Major comments

- Line 1: "The paper 'Supervised hashing for image retrieval via image representation learning' by Yang et al."

- Line 2: "This paper focuses on the image retrieval task."

- Line 3: "In this paper, we propose a deep supervised hashing method called Simultaneous Deep Hashing."

- Line 4: "The SDH method consists of two networks: a feature learning network and a hashing network."

- Line 4: Please add the "and" in the sentence "The SDH method consists of two..."

- Line 4: Please remove the "and" in the sentence : "The SDH method consists of a feature learning network and a hashing network. "

- Line 5: "The feature learning network is a deep CNN that is trained on the large-scale image datasets to learn discriminative image representations."

 - Line 5: Please add the "and" in sentence "The feature learning network is a..."

- Line 6: "In this paper, we propose the SDH method."

- Line 6: Please add the "SDH" in the sentence "The SDH"

- Line 7: "The SDH method learns compact hash codes from the image representations by minimizing the pairwise similarity loss."

 - Line 7: Please add the "SDH"

- Line 8: "The SDH method consists of the feature learning network and the hashing network."

 - Line 8: Please add the "and" in "The SDH method consists of..."

- Line 9: "The loss function of the SDM method is a combination of a classification loss and the pairwise similarity loss."

 - Line 9: Please add the "SDH" and the "loss function" in the sentence "The loss function of the SDH"

- Line 10: "The SDH method consists of... "

 - Line 10: Please add the "and" in

- Line 11: "The SDH method learns compact bit codes by minimizing the pairwise similarity loss."

 - Line 11: Please add the "SDH"</s>

Now, without having read this specific paper in detail, I think the summary at the beginning does include some very relevant information (e.g. that binary codes are used, that they are compared by hamming distance, that CNNs are used as features extractors, all of which is part of the said publication).

What also is funny of course are those comments to the paper which seem to be coming from some kind of open review. Definitely some interesting results for a first shot.

Here is the complete code I've used to generate this:

from transformers import AutoTokenizer, OPTForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("facebook/galactica-6.7b")
tokenizer.pad_token_id = 1
tokenizer.padding_side = 'left'
tokenizer.model_max_length = 200

model = OPTForCausalLM.from_pretrained("facebook/galactica-6.7b", device_map="auto", torch_dtype=torch.float16)

input_text = '# Introduction \n\n The main idea of the paper "Supervised hashing for image retrieval via image representation learning" is'

input_ids = tokenizer(input_text, padding='max_length', return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids, max_new_tokens=2000,
                         do_sample=True,
                         temperature=0.7,
                         top_k=25,
                         top_p=0.9,
                         no_repeat_ngram_size=10,
                         early_stopping=True)

with open('galactica_2000.md', 'w') as result_file:
    result_file.write(tokenizer.decode(outputs[0]).lstrip('<pad>'))