jyueling / 601Project

0 stars 2 forks source link

Automatic Paper Summary

paper summary

Product Statement

Text summarization is the problem of creating a short, accurate, and fluent summary of a longer text document. Here is an Introduction to text summary

Basic approach

Text summarization can broadly be divided into two categories — Extractive Summarization and Abstractive Summarization

Extractive Summarization

TextRank

TextRank is based on PageRank algorithm that is used on Google Search Engine. Its base concept is "The linked page is good, much more if it from many linked page". In TextRank, article is divided into basic text units, i.e., words or phrases. As treated as webpage in PageRank, text unit maps to vertex in graph, and edge between vertexes refers to the link between text units. The Classic PageRank algorithm workflow is as below: PageRank

Usage

Abstractive Summarization

Sequence-to-Sequence with Attention Model for Text Summarization

To build our model,we will use a two-layered bidirectional RNN with LSTMs on the input data for the encoder layer and two layers, each with an LSTM using attention on the target data for the decoder.This model is based on Xin Pan’s and Peter Liu’s model(Github).Here is a good article for this model and explains parts of the codes in detail.

Technology Selection

our project technology diagram

Abstractive Summary_1

Dataset

Usage

Abstractive_Summary_2

Dataset

Usage

Stanford NLP

Stanford Open Information Extraction

Problems we have met

Reference

Recent Trends in Deep Learning Based Natural Language Processing: https://arxiv.org/pdf/1708.02709.pdf

Extractive Summarization: https://github.com/davidadamojr/TextRank

Abstractive Summarization1: https://github.com/Currie32/Text-Summarization-with-Amazon-Reviews

Stanford CoreNLP: https://stanfordnlp.github.io/CoreNLP/, https://nlp.stanford.edu/software/openie.html#About

Recent Trends in Deep Learning Based Natural Language Processing: https://arxiv.org/pdf/1708.02709.pdf