nlp-with-transformers / notebooks

Jupyter notebooks for the Natural Language Processing with Transformers book
https://transformersbook.com/
Apache License 2.0
3.7k stars 1.13k forks source link

chapter 06 - summarization - processing the entire dataset #135

Open amscosta opened 3 months ago

amscosta commented 3 months ago

Information

The question or comment is about chapter:

Question or comment

Great book. My question is very simple : How can I extend the summarizing process for the entire dataset. I.e. , from the first row: sample_text = dataset["train"][1]["article"][:2000] To all rows. Apologies if sounds very silly.

Ice-Citron commented 1 week ago
sample_texts = [article[:2000] for article in dataset["train"]["article"]]
def shorten_article(example):
    example["article"] = example["article"][:2000]
    return example

dataset["train"] = dataset["train"].map(shorten_article)
Ice-Citron commented 1 week ago

not sure if these work. try em out