Hi @patil-suraj, as you make a README/summary of the codebase, I'd like to provide my understanding of the code to perhaps give some insight into how a "new pair of eyes" is understanding your codebase.
From a high-level, it seems like longbart is doing a few things architecturally:
Reusing the high-level encoder-decoder architecture of BART via BartForConditionalGeneration
Replacing BART's encoder attention layers with the LongformerSelfAttentionForBart
Increasing the attention_window to 1024
Increasing max_pos (positional embeddings) to 4096
From there, in order to use longbart for new long-form, abstractive text summarization, one would need to pre-train longbart on a new dataset (is this accurate)?
Hi @patil-suraj, as you make a README/summary of the codebase, I'd like to provide my understanding of the code to perhaps give some insight into how a "new pair of eyes" is understanding your codebase.
From a high-level, it seems like longbart is doing a few things architecturally:
BartForConditionalGeneration
LongformerSelfAttentionForBart
attention_window
to1024
max_pos
(positional embeddings) to4096
From there, in order to use longbart for new long-form, abstractive text summarization, one would need to pre-train longbart on a new dataset (is this accurate)?
New dataset examples suggestions are PubMed and BigPatent from here: https://github.com/allenai/longformer/issues/28#issuecomment-638541231
Are there any things I am missing? The "Replacing BART's encoder attention layers" feels like it is the core implementation update.