issues
search
huggingface
/
datasets
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
19.29k
stars
2.7k
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[Regression] IterableDataset is broken on 2.20.0
#7085
AjayP13
closed
3 months ago
3
More easily support streaming local files
#7084
fschlatt
opened
3 months ago
0
fix streaming from arrow files
#7083
fschlatt
closed
2 months ago
0
Support HTTP authentication in non-streaming mode
#7082
albertvillanova
closed
3 months ago
2
Set load_from_disk path type as PathLike
#7081
albertvillanova
closed
3 months ago
2
Generating train split takes a long time
#7080
alexanderswerdlow
opened
4 months ago
2
HfHubHTTPError: 500 Server Error: Internal Server Error for url:
#7079
neoneye
closed
4 months ago
17
Fix CI test_convert_to_parquet
#7078
albertvillanova
closed
4 months ago
2
column_names ignored by load_dataset() when loading CSV file
#7077
luismsgomes
opened
4 months ago
1
🧪 Do not mock create_commit
#7076
coyotte508
closed
4 months ago
1
Update required soxr version from pre-release to release
#7075
albertvillanova
closed
4 months ago
2
Fix CI by temporarily marking test_convert_to_parquet as expected to fail
#7074
albertvillanova
closed
4 months ago
2
CI is broken for convert_to_parquet: Invalid rev id: refs/pr/1 404 error causes RevisionNotFoundError
#7073
albertvillanova
closed
4 months ago
9
nm
#7072
brettdavies
closed
4 months ago
0
Filter hangs
#7071
lucienwalewski
opened
4 months ago
0
how set_transform affects batch size?
#7070
VafaKnm
opened
4 months ago
0
Fix push_to_hub by not calling create_branch if PR branch
#7069
albertvillanova
closed
3 months ago
8
Fix prepare_single_hop_path_and_storage_options
#7068
albertvillanova
closed
3 months ago
2
Convert_to_parquet fails for datasets with multiple configs
#7067
HuangZhen02
closed
3 months ago
3
One subset per file in repo ?
#7066
lhoestq
opened
4 months ago
0
Cannot get item after loading from disk and then converting to iterable.
#7065
happyTonakai
opened
4 months ago
0
Add `batch` method to `Dataset` class
#7064
lappemic
closed
4 months ago
6
Add `batch` method to `Dataset`
#7063
lappemic
closed
4 months ago
0
Avoid calling http_head for non-HTTP URLs
#7062
albertvillanova
closed
4 months ago
2
Custom Dataset | Still Raise Error while handling errors in _generate_examples
#7061
hahmad2008
opened
4 months ago
0
WebDataset BuilderConfig
#7060
hlky
closed
4 months ago
1
None values are skipped when reading jsonl in subobjects
#7059
PonteIneptique
opened
4 months ago
0
New feature type: Document
#7058
severo
opened
4 months ago
0
Update load_hub.mdx
#7057
severo
closed
4 months ago
2
Make `BufferShuffledExamplesIterable` resumable
#7056
yzhangcs
opened
4 months ago
8
WebDataset with different prefixes are unsupported
#7055
hlky
closed
4 months ago
8
Add batching to `IterableDataset`
#7054
lappemic
closed
4 months ago
5
Datasets.datafiles resolve_pattern `TypeError: can only concatenate tuple (not "str") to tuple`
#7053
MatthewYZhang
closed
4 months ago
2
Adding `Music` feature for symbolic music modality (MIDI, abc)
#7052
Natooz
closed
3 months ago
0
How to set_epoch with interleave_datasets?
#7051
jonathanasdf
closed
3 months ago
7
add checkpoint and resume title in docs
#7050
lhoestq
closed
4 months ago
2
Save nparray as list
#7049
Sakurakdx
closed
4 months ago
5
ImportError: numpy.core.multiarray when using `filter`
#7048
kamilakesbi
closed
4 months ago
4
Save Dataset as Sharded Parquet
#7047
tom-p-reichel
opened
4 months ago
2
Support librosa and numpy 2.0 for Python 3.10
#7046
albertvillanova
closed
4 months ago
2
Fix tensorflow min version depending on Python version
#7045
albertvillanova
closed
4 months ago
2
Mark tests that require librosa
#7044
albertvillanova
closed
4 months ago
2
Add decorator as explicit test dependency
#7043
albertvillanova
closed
4 months ago
2
Improved the tutorial by adding a link for loading datasets
#7042
AmboThom
closed
3 months ago
1
`sort` after `filter` unreasonably slow
#7041
Tobin-rgb
opened
4 months ago
1
load `streaming=True` dataset with downloaded cache
#7040
wanghaoyucn
opened
4 months ago
2
Fix export to JSON when dataset larger than batch size
#7039
albertvillanova
opened
4 months ago
3
A bug of Dataset.to_json() function
#7037
LinglingGreat
opened
4 months ago
2
Fix doc generation when NamedSplit is used as parameter default value
#7036
albertvillanova
closed
4 months ago
2
Docs are not generated when a parameter defaults to a NamedSplit value
#7035
albertvillanova
closed
4 months ago
0
Previous
Next