High-level understanding of code

Hi @patil-suraj, as you make a README/summary of the codebase, I'd like to provide my understanding of the code to perhaps give some insight into how a "new pair of eyes" is understanding your codebase.

From a high-level, it seems like longbart is doing a few things architecturally:

Reusing the high-level encoder-decoder architecture of BART via BartForConditionalGeneration
Replacing BART's encoder attention layers with the LongformerSelfAttentionForBart
Increasing the attention_window to 1024
Increasing max_pos (positional embeddings) to 4096

From there, in order to use longbart for new long-form, abstractive text summarization, one would need to pre-train longbart on a new dataset (is this accurate)?

New dataset examples suggestions are PubMed and BigPatent from here: https://github.com/allenai/longformer/issues/28#issuecomment-638541231

Are there any things I am missing? The "Replacing BART's encoder attention layers" feels like it is the core implementation update.

patil-suraj / longbart

High-level understanding of code #1