chapter 06 - summarization - processing the entire dataset

amscosta commented 3 months ago

Information

The question or comment is about chapter:

[ ] Introduction
[ ] Text Classification
[ ] Transformer Anatomy
[ ] Multilingual Named Entity Recognition
[ ] Text Generation
[X] Summarization
[ ] Question Answering
[ ] Making Transformers Efficient in Production
[ ] Dealing with Few to No Labels
[ ] Training Transformers from Scratch
[ ] Future Directions

Question or comment

Great book. My question is very simple : How can I extend the summarizing process for the entire dataset. I.e. , from the first row: sample_text = dataset["train"][1]["article"][:2000] To all rows. Apologies if sounds very silly.

Ice-Citron commented 1 week ago

sample_texts = [article[:2000] for article in dataset["train"]["article"]]

def shorten_article(example):
    example["article"] = example["article"][:2000]
    return example

dataset["train"] = dataset["train"].map(shorten_article)

Ice-Citron commented 1 week ago

not sure if these work. try em out

nlp-with-transformers / notebooks

chapter 06 - summarization - processing the entire dataset #135

Information

Question or comment