tensorflow / tensor2tensor

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Apache License 2.0
15.52k stars 3.49k forks source link

[Question] Using attention visualization to automatically summarize document? #1448

Open Hellisotherpeople opened 5 years ago

Hellisotherpeople commented 5 years ago

I've been looking for a tool which can give me some type of token-based extractive summarization to solve an especially interesting problem in the Competitive Debate community. I think that this tool will help me solve it.

I've wanted to create a neural network which summarizes texts by using a "highlighter", that is, by summarizing documents out of the words used in the original document (but NOT the sentences). I cannot seem to find a neural network based method that does exactly what I'm asking.... but the attention mechanism (and it's visualizations) show highlights of a particular source document in terms of highlighting the most important parts to cause a transformation to document b. This seems to be what I want

Actually, just as I typed out the previous paragraph, I'm getting the idea to do something like this: Take a news article and an abstractivly made short "summary" of a news article, and then take the most attended to tokens in the transformation between news article and summary and use that as the summary itself. Can I use tensor2tensor to do what I am describing via, say, bert, and if I can't, what are my best options?

My main issue so far is that I want the total attention weights, not just the ones for each layer.

Hellisotherpeople commented 5 years ago

So ideally, it's something like this:

Sentence 1: "Trumps’ popularity is high and going higher" Sentence 2: The entire news article (800 words)

And then I take say, the top 10 or 20% most attended to tokens from sentence 2 and use those as my extractive summary

I think that this would give me specific summarizations contextually based in such a way that Sentence 2 is summarized to argue the point made in Sentence 1.

How would I easily verify this or implement this? How can I visualize the total attention score for each token (not just divided up by layer)