huggingface / course

The Hugging Face course on Transformers
https://huggingface.co/course
Apache License 2.0
2.17k stars 720 forks source link

Broken Link: Chapter 5.4 Big Data #595

Open hrh-bbc-rd opened 1 year ago

hrh-bbc-rd commented 1 year ago

The link to the PubMed Abstracts Database is broken in the Chapter 5 Section 4 'Big Datasets Chapter'.

Broken link in question found in

data_files = "https://the-eye.eu/public/AI/pile_preliminary_components/PUBMED_title_abstracts_2019_baseline.jsonl.zst"

Chapter here

hrh-bbc-rd commented 1 year ago

I have been able to continue doing the course by using this link instead

data_files = "https://the-eye.eu/public/AI/pile_v2/data/NIH_ExPORTER_awarded_grant_text.jsonl.zst"

tj-cahill commented 1 year ago

Looks like this URL changing and breaking the link has been an issue before (see #324)

tj-cahill commented 1 year ago

Note that there is another broken link further down the page on this line in the following code block:

law_dataset_streamed = load_dataset(
    "json",
    data_files="https://the-eye.eu/public/AI/pile_preliminary_components/FreeLaw_Opinions.jsonl.zst",
    split="train",
    streaming=True,
)
next(iter(law_dataset_streamed))
Dboee commented 8 months ago

Same issue here, looks like the pile has been taken down due to copyright reasons.