ibm-aur-nlp / PubLayNet

Other
900 stars 165 forks source link

Checksum for PubLayNet_PDF.tar.gz #45

Open conjuncts opened 6 months ago

conjuncts commented 6 months ago

Hello, I tried downloading the pdf dataset, but I only unzipped around 10% before I ran into a data corruption issue. Are checksums or data splits available for the PubLayNet_PDF.tar.gz?

themanoftalent commented 6 months ago

It sounds like you're encountering issues with downloading the PubLayNet dataset. Unfortunately, without specific details about where you're downloading the dataset from, it's challenging to provide a precise solution for me. However, I can offer some general advice for ya.

  1. Check for Official Sources: Ensure that you're downloading the dataset from the official source. This is very typical.
  2. Checksums: Check if the dataset provider offers checksums for the files.
  3. Data Splits: Some datasets are split into multiple parts for easier downloading. Ensure that you've downloaded all parts.
  4. Redownload: If you suspect the downloaded file is corrupted, try downloading it again. It works sometimes. Akif, the outlier