issues
search
huggingface
/
datasets
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
19.29k
stars
2.7k
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Use load_dataset to load imagenet-1K But find a empty dataset
#7139
fscdc
opened
2 months ago
2
Cache only changed columns?
#7138
Modexus
opened
2 months ago
2
[BUG] dataset_info sequence unexpected behavior in README.md YAML
#7137
ain-soph
opened
2 months ago
1
Do not consume unnecessary memory during sharding
#7136
janEbert
opened
2 months ago
0
Bug: Type Mismatch in Dataset Mapping
#7135
marko1616
opened
2 months ago
3
Attempting to return a rank 3 grayscale image from dataset.map results in extreme slowdown
#7134
navidmafi
opened
2 months ago
0
remove filecheck to enable symlinks
#7133
fschlatt
opened
2 months ago
5
Fix data file module inference
#7132
HennerM
opened
2 months ago
3
Inconsistent output in documentation example: `num_classes` not displayed in `ClassLabel` output
#7129
sergiopaniego
opened
2 months ago
0
Filter Large Dataset Entry by Entry
#7128
QiyaoWei
opened
3 months ago
4
Caching shuffles by np.random.Generator results in unintiutive behavior
#7127
el-hult
opened
3 months ago
5
Disable implicit token in CI
#7126
albertvillanova
closed
3 months ago
2
Fix wrong SHA in CI tests of HubDatasetModuleFactoryWithParquetExport
#7125
albertvillanova
closed
3 months ago
2
Test get_dataset_config_info with non-existing/gated/private dataset
#7124
albertvillanova
closed
3 months ago
2
Make dataset viewer more flexible in displaying metadata alongside images
#7123
egrace479
opened
3 months ago
3
[interleave_dataset] sample batches from a single source at a time
#7122
memray
opened
3 months ago
0
Fix typed examples iterable state dict
#7121
lhoestq
closed
3 months ago
2
don't mention the script if trust_remote_code=False
#7120
severo
closed
3 months ago
3
Install transformers with numpy-2 CI
#7119
albertvillanova
closed
3 months ago
2
Allow numpy-2.1 and test it without audio extra
#7118
albertvillanova
closed
3 months ago
2
Audio dataset load everything in RAM and is very slow
#7117
Jourdelune
opened
3 months ago
3
datasets cannot handle nested json if features is given.
#7116
ljw20180420
closed
2 months ago
3
module 'pyarrow.lib' has no attribute 'ListViewType'
#7115
neurafusionai
closed
2 months ago
1
Temporarily pin numpy<2.1 to fix CI
#7114
albertvillanova
closed
3 months ago
2
Stream dataset does not iterate if the batch size is larger than the dataset size (related to drop_last_batch)
#7113
memray
closed
3 months ago
1
cudf-cu12 24.4.1, ibis-framework 8.0.0 requires pyarrow<15.0.0a0,>=14.0.1,pyarrow<16,>=2 and datasets 2.21.0 requires pyarrow>=15.0.0
#7112
SoumyaMB10
opened
3 months ago
2
CI is broken for numpy-2: Failed to fetch wheel: llvmlite==0.34.0
#7111
albertvillanova
closed
3 months ago
2
Fix ConnectionError for gated datasets and unauthenticated users
#7110
albertvillanova
closed
3 months ago
4
ConnectionError for gated datasets and unauthenticated users
#7109
albertvillanova
closed
3 months ago
0
website broken: Create a new dataset repository, doesn't create a new repo in Firefox
#7108
neoneye
closed
3 months ago
4
load_dataset broken in 2.21.0
#7107
anjor
closed
3 months ago
4
Rename LargeList.dtype to LargeList.feature
#7106
albertvillanova
closed
3 months ago
2
Use `huggingface_hub` cache
#7105
lhoestq
closed
3 months ago
7
remove more script docs
#7104
lhoestq
closed
3 months ago
2
Fix args of feature docstrings
#7103
albertvillanova
closed
3 months ago
2
Slow iteration speeds when using IterableDataset.shuffle with load_dataset(data_files=..., streaming=True)
#7102
lajd
opened
3 months ago
2
`load_dataset` from Hub with `name` to specify `config` using incorrect builder type when multiple data formats are present
#7101
hlky
opened
3 months ago
1
IterableDataset: cannot resolve features from list of numpy arrays
#7100
VeryLazyBoy
opened
3 months ago
1
Set dev version
#7099
albertvillanova
closed
3 months ago
2
Release: 2.21.0
#7098
albertvillanova
closed
3 months ago
1
Some of DownloadConfig's properties are always being overridden in load.py
#7097
ductai199x
opened
3 months ago
0
Automatically create `cache_dir` from `cache_file_name`
#7096
ringohoffman
closed
3 months ago
3
Add Arabic Docs to Datasets
#7094
AhmedAlmaghz
opened
3 months ago
0
Add Arabic Docs to datasets
#7093
AhmedAlmaghz
opened
3 months ago
0
load_dataset with multiple jsonlines files interprets datastructure too early
#7092
Vipitis
opened
3 months ago
5
The test test_move_script_doesnt_change_hash fails because it runs the 'python' command while the python executable has a different name
#7090
yurivict
opened
3 months ago
0
Missing pyspark dependency causes the testsuite to error out, instead of a few tests to be skipped
#7089
yurivict
opened
3 months ago
0
Disable warning when using with_format format on tensors
#7088
Haislich
opened
3 months ago
0
Unable to create dataset card for Lushootseed language
#7087
vaishnavsudarshan
closed
3 months ago
2
load_dataset ignores cached datasets and tries to hit HF Hub, resulting in API rate limit errors
#7086
tginart
opened
3 months ago
0
Previous
Next