Closed severo closed 1 year ago
From the logs&_a=(columns:!(message,kubernetes.namespace),filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:de38ff80-ac19-11ec-bb45-ad141ad1c5f8,key:kubernetes.namespace,negate:!f,params:(query:datasets-server),type:phrase),query:(match_phrase:(kubernetes.namespace:datasets-server)))),index:de38ff80-ac19-11ec-bb45-ad141ad1c5f8,interval:auto,query:(language:kuery,query:''),sort:!(!('@timestamp',desc)))) (internal) we see two error responses:
{"error":"IO Error: No files found that match the pattern \"/storage/stats-cache/72194527872800-split-descriptive-statistics-RepoFusion-Stack-Rep-0ab447b7/bm25_contexts/train/*.parquet\""}
{"error":"Unexpected error."}
due to
libcommon.parquet_utils.TooBigRows: Rows from parquet row groups are too big to be read: 308.26 MiB (max=286.10 MiB)
Same errors for https://huggingface.co/datasets/deepmind/code_contests/viewer/default/train?p=1
{"error":"IO Error: No files found that match the pattern \"/storage/stats-cache/10704050661432-split-descriptive-statistics-deepmind-code_contes-d4a66b3c/default/train/*.parquet\""}
{"error":"Unexpected error."}
log:
libcommon.parquet_utils.TooBigRows: Rows from parquet row groups are too big to be read: 381.36 MiB (max=286.10 MiB)
closing in favor of #1957
Just launched the recreation of these two datasets.
https://huggingface.co/datasets/RepoFusion/Stack-Repo is now disabled because of the dataset script.
https://huggingface.co/datasets/deepmind/code_contests -> JobManagerCrashedError
https://huggingface.co/datasets/RepoFusion/Stack-Repo/viewer/bm25_contexts