huggingface / dataset-viewer

Backend that powers the dataset viewer on Hugging Face dataset pages through a public API.
https://huggingface.co/docs/dataset-viewer
Apache License 2.0
700 stars 77 forks source link

UnexpectedApiError on viewer #2064

Closed severo closed 1 year ago

severo commented 1 year ago

https://huggingface.co/datasets/RepoFusion/Stack-Repo/viewer/bm25_contexts

Error code: UnexpectedApiError

severo commented 1 year ago

From the logs&_a=(columns:!(message,kubernetes.namespace),filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:de38ff80-ac19-11ec-bb45-ad141ad1c5f8,key:kubernetes.namespace,negate:!f,params:(query:datasets-server),type:phrase),query:(match_phrase:(kubernetes.namespace:datasets-server)))),index:de38ff80-ac19-11ec-bb45-ad141ad1c5f8,interval:auto,query:(language:kuery,query:''),sort:!(!('@timestamp',desc)))) (internal) we see two error responses:

  1. https://datasets-server.huggingface.co/statistics?dataset=RepoFusion%2FStack-Repo&config=bm25_contexts&split=train
{"error":"IO Error: No files found that match the pattern \"/storage/stats-cache/72194527872800-split-descriptive-statistics-RepoFusion-Stack-Rep-0ab447b7/bm25_contexts/train/*.parquet\""}
  1. https://datasets-server.huggingface.co/rows?dataset=RepoFusion%2FStack-Repo&config=bm25_contexts&split=train&offset=0&length=100
{"error":"Unexpected error."}

due to

libcommon.parquet_utils.TooBigRows: Rows from parquet row groups are too big to be read: 308.26 MiB (max=286.10 MiB)
severo commented 1 year ago

Same errors for https://huggingface.co/datasets/deepmind/code_contests/viewer/default/train?p=1

  1. https://datasets-server.huggingface.co/statistics?dataset=deepmind%2Fcode_contests&config=default&split=train
{"error":"IO Error: No files found that match the pattern \"/storage/stats-cache/10704050661432-split-descriptive-statistics-deepmind-code_contes-d4a66b3c/default/train/*.parquet\""}
  1. https://datasets-server.huggingface.co/rows?dataset=deepmind%2Fcode_contests&config=default&split=train&offset=100&length=100
{"error":"Unexpected error."}

log:

libcommon.parquet_utils.TooBigRows: Rows from parquet row groups are too big to be read: 381.36 MiB (max=286.10 MiB)
severo commented 1 year ago

closing in favor of #1957

severo commented 1 year ago

Just launched the recreation of these two datasets.

https://huggingface.co/datasets/RepoFusion/Stack-Repo is now disabled because of the dataset script.

https://huggingface.co/datasets/deepmind/code_contests -> JobManagerCrashedError