[Question] Using attention visualization to automatically summarize document?

I've been looking for a tool which can give me some type of token-based extractive summarization to solve an especially interesting problem in the Competitive Debate community. I think that this tool will help me solve it.

I've wanted to create a neural network which summarizes texts by using a "highlighter", that is, by summarizing documents out of the words used in the original document (but NOT the sentences). I cannot seem to find a neural network based method that does exactly what I'm asking.... but the attention mechanism (and it's visualizations) show highlights of a particular source document in terms of highlighting the most important parts to cause a transformation to document b. This seems to be what I want

Actually, just as I typed out the previous paragraph, I'm getting the idea to do something like this: Take a news article and an abstractivly made short "summary" of a news article, and then take the most attended to tokens in the transformation between news article and summary and use that as the summary itself. Can I use tensor2tensor to do what I am describing via, say, bert, and if I can't, what are my best options?

My main issue so far is that I want the total attention weights, not just the ones for each layer.

tensorflow / tensor2tensor

[Question] Using attention visualization to automatically summarize document? #1448