patil-suraj / longbart

A long version of BART model based on Longformer model
23 stars 5 forks source link

High-level understanding of code #1

Open virattt opened 4 years ago

virattt commented 4 years ago

Hi @patil-suraj, as you make a README/summary of the codebase, I'd like to provide my understanding of the code to perhaps give some insight into how a "new pair of eyes" is understanding your codebase.

From a high-level, it seems like longbart is doing a few things architecturally:

From there, in order to use longbart for new long-form, abstractive text summarization, one would need to pre-train longbart on a new dataset (is this accurate)?

New dataset examples suggestions are PubMed and BigPatent from here: https://github.com/allenai/longformer/issues/28#issuecomment-638541231

Are there any things I am missing? The "Replacing BART's encoder attention layers" feels like it is the core implementation update.