issues
search
huggingface
/
datasets
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
18.52k
stars
2.53k
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Add the option of saving in parquet instead of arrow
#6903
arita37
opened
18 hours ago
2
Make CLI convert_to_parquet not raise error if no rights to create script branch
#6902
albertvillanova
closed
19 hours ago
2
HTTPError 403 raised by CLI convert_to_parquet when creating script branch on 3rd party repos
#6901
albertvillanova
closed
19 hours ago
0
[WebDataset] KeyError with user-defined `Features` when a field is missing in an example
#6900
lhoestq
opened
1 day ago
0
List of dictionary features get standardized
#6899
sohamparikh94
opened
1 day ago
0
Fix YAML error in README files appearing on GitHub
#6898
albertvillanova
closed
18 hours ago
3
datasets template guide :: issue in documentation YAML
#6897
bghira
closed
18 hours ago
2
Regression bug: `NonMatchingSplitsSizesError` for (possibly) overwritten dataset
#6896
finiteautomata
opened
3 days ago
0
Document that to_json defaults to JSON Lines
#6895
albertvillanova
closed
17 hours ago
2
Better document defaults of to_json
#6894
albertvillanova
closed
17 hours ago
0
Close gzipped files properly
#6893
lhoestq
closed
3 days ago
3
Add support for categorical/dictionary types
#6892
EthanSteinberg
opened
5 days ago
0
Unable to load JSON saved using `to_json`
#6891
DarshanDeshpande
closed
5 days ago
2
add `with_transform` and/or `set_transform` to IterableDataset
#6890
not-lain
opened
1 week ago
0
fix bug #6877
#6889
arthasking123
closed
3 days ago
9
Support WebDataset containing file basenames with dots
#6888
albertvillanova
closed
6 days ago
5
FAISS load to None
#6887
brainer3220
opened
1 week ago
1
load_dataset with data_dir and cache_dir set fail with not supported
#6886
fah
opened
1 week ago
0
Support jax 0.4.27 in CI tests
#6885
albertvillanova
closed
1 week ago
2
CI is broken after jax-0.4.27 release: AttributeError: 'jaxlib.xla_extension.DeviceList' object has no attribute 'device'
#6884
albertvillanova
closed
1 week ago
0
Require Pillow >= 9.4.0 to avoid AttributeError when loading image dataset
#6883
albertvillanova
closed
17 hours ago
3
Connection Error When Using By-pass Proxies
#6882
MRNOBODY-ZST
opened
1 week ago
1
AttributeError: module 'PIL.Image' has no attribute 'ExifTags'
#6881
albertvillanova
closed
17 hours ago
0
Webdataset: KeyError: 'png' on some datasets when streaming
#6880
lhoestq
opened
1 week ago
5
Batched mapping does not raise an error if values for an existing column are empty
#6879
felix-schneider
opened
1 week ago
0
Create function to convert to parquet
#6878
albertvillanova
closed
17 hours ago
2
OSError: [Errno 24] Too many open files
#6877
loicmagne
closed
3 days ago
5
Unpin hfh
#6876
lhoestq
opened
1 week ago
4
Shorten long logs
#6875
lhoestq
closed
1 week ago
2
Use pandas ujson in JSON loader to improve performance
#6874
albertvillanova
opened
1 week ago
2
Set dev version
#6873
albertvillanova
closed
1 week ago
2
Release 2.19.1
#6872
albertvillanova
closed
1 week ago
0
Fix download for dict of dicts of URLs
#6871
albertvillanova
closed
1 week ago
4
Update tqdm >= 4.66.3 to fix vulnerability
#6870
albertvillanova
closed
1 week ago
2
Download is broken for dict of dicts: FileNotFoundError
#6869
albertvillanova
closed
1 week ago
0
datasets.BuilderConfig does not work.
#6868
jdm4pku
closed
1 week ago
1
Improve performance of JSON loader
#6867
albertvillanova
opened
1 week ago
5
DataFilesNotFoundError for datasets in the open-llm-leaderboard
#6866
jerome-white
closed
3 days ago
3
Example on Semantic segmentation contains bug
#6865
ducha-aiki
opened
1 week ago
0
Dataset 'rewardsignal/reddit_writing_prompts' doesn't exist on the Hub
#6864
vinodrajendran001
closed
1 week ago
1
Revert temporary pin huggingface-hub < 0.23.0
#6863
albertvillanova
opened
2 weeks ago
0
Issue 6598: load_dataset broken for data_files on s3
#6862
matstrand
opened
2 weeks ago
0
Fix CI by temporarily pinning huggingface-hub < 0.23.0
#6861
albertvillanova
closed
2 weeks ago
2
CI fails after huggingface_hub-0.23.0 release: FutureWarning: "resume_download"
#6860
albertvillanova
closed
2 weeks ago
3
Support folder-based datasets with large metadata.jsonl
#6859
gbenson
opened
2 weeks ago
0
Segmentation fault
#6858
scampion
closed
1 week ago
2
Fix line-endings in tests on Windows
#6857
albertvillanova
closed
2 weeks ago
2
CI fails on Windows for test_delete_from_hub and test_xgetsize_private due to new-line character
#6856
albertvillanova
closed
2 weeks ago
1
Fix dataset name for community Hub script-datasets
#6855
albertvillanova
closed
1 week ago
6
Wrong example of usage when config name is missing for community script-datasets
#6854
albertvillanova
closed
1 week ago
0
Next