Open severo opened 9 months ago
Launched the recreation of imvladikon/hebrew_speech_coursera
.
-> JobManagerCrashedError 😮
UnexpectedApiError
for https://huggingface.co/datasets/danielz01/landmarks
libcommon.parquet_utils.TooBigRows: Rows from parquet row groups are too big to be read: 958.13 MiB (max=286.10 MiB)
Note that the issue is that the cells are too big (in bytes) and it's not related to the row groups (I was mistaken in the title)
Same UnexpectedApiError
for https://huggingface.co/datasets/osunlp/Mind2Web, row group is 564MB for 100 rows
row group is 564MB for 100 rows
The issue is that we don't allow big "cells". What should we do? Improve the error message? Allow big cells? Truncate?
For the UI the best is to truncate, and a bonus would be to let the user click to expand a row
so: I think we should add a query parameter, like: "full: boolean", or "truncate: boolean", to /rows, /search, /filter.
Also reported here: https://huggingface.co/datasets/UmaDiffusion/ULTIMA/discussions/1
Somewhat related: https://huggingface.co/datasets/mikehemberger/inat_2021_train_mini_plantae
We should truncate more aggressively, even for /first-rows
Hi,
Thanks for bringing this up. You are probably aware of this but once I click on the „Viewer“, the data is visible there.
Best,
Here is another „raw“ image dataset that I’ve uploaded via the web-interface (assuming it was faster then pushing it from a notebook). Hope this helps Best, M https://huggingface.co/datasets/mikehemberger/medicinal-plants/discussions/2#657c317f1953a4194ad0952d
The issue for https://huggingface.co/datasets/mikehemberger/inat_2021_train_mini_plantae is about first_rows truncation, not about autoconverted parquet files no ?
maybe open a separate issue
yes, I brought the discussion here, but you're right, the issue is somewhat related. Maybe we can fix both at the same time though.
was there any action on this?
See https://huggingface.co/datasets/imvladikon/hebrew_speech_coursera/discussions/1#6523d448b623a04e6c2f118a