issues
search
huggingface
/
datasets
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
https://huggingface.co/docs/datasets
Apache License 2.0
19.28k
stars
2.7k
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Set dev version
#7246
lhoestq
closed
1 month ago
1
Release: 3.0.2
#7245
lhoestq
closed
1 month ago
1
use huggingface_hub offline mode
#7244
lhoestq
closed
1 month ago
1
ArrayXD with None as leading dim incompatible with DatasetCardData
#7243
alex-hh
opened
1 month ago
5
`push_to_hub` overwrite argument
#7241
ceferisbarov
closed
4 weeks ago
9
Feature Request: Add functionality to pass split types like train, test in DatasetDict.map
#7240
jp1924
opened
1 month ago
0
incompatibily issue when using load_dataset with datasets==3.0.1
#7238
jupiterMJM
opened
1 month ago
0
[MINOR:TYPO] Update arrow_dataset.py
#7236
cakiki
closed
4 weeks ago
0
No need for dataset_info
#7234
lhoestq
closed
1 month ago
2
数据集数量问题
#7233
want-well
opened
1 month ago
0
(Super tiny doc update) Mention to_polars
#7232
fzyzcjy
closed
4 weeks ago
1
Fix typo in image dataset docs
#7231
albertvillanova
closed
1 month ago
1
Video support
#7230
lhoestq
closed
4 weeks ago
1
handle config_name=None in push_to_hub
#7229
alex-hh
closed
4 weeks ago
1
Composite (multi-column) features
#7228
alex-hh
opened
1 month ago
0
fast array extraction
#7227
alex-hh
opened
1 month ago
3
Add R as a How to use from the Polars (R) Library as an option
#7226
ran-codes
opened
1 month ago
0
Huggingface GIT returns null as Content-Type instead of application/x-git-receive-pack-result
#7225
padmalcom
opened
1 month ago
0
fallback to default feature casting in case custom features not available during dataset loading
#7224
alex-hh
opened
1 month ago
0
Fallback to arrow defaults when loading dataset with custom features that aren't registered locally
#7223
alex-hh
opened
1 month ago
0
TypeError: Couldn't cast array of type string to null in long json
#7222
nokados
opened
1 month ago
0
add CustomFeature base class to support user-defined features with encoding/decoding logic
#7221
alex-hh
opened
1 month ago
2
Custom features not compatible with special encoding/decoding logic
#7220
alex-hh
opened
1 month ago
2
bump fsspec
#7219
lhoestq
closed
1 month ago
1
ds.map(f, num_proc=10) is slower than df.apply
#7217
lanlanlanlanlanlan365
opened
1 month ago
1
Iterable dataset map with explicit features causes slowdown for Sequence features
#7215
alex-hh
opened
1 month ago
0
Formatted map + with_format(None) changes array dtype for iterable datasets
#7214
alex-hh
opened
1 month ago
1
Add with_rank to Dataset.from_generator
#7213
muthissar
opened
1 month ago
0
Windows do not supprot signal.alarm and singal.signal
#7212
TomasJavurek
opened
1 month ago
0
Describe only selected fields in README
#7211
alozowski
opened
1 month ago
0
Convert Array features to numpy arrays rather than lists by default
#7210
alex-hh
opened
1 month ago
0
Preserve features in iterable dataset.filter
#7209
alex-hh
closed
1 month ago
3
Iterable dataset.filter should not override features
#7208
alex-hh
closed
1 month ago
1
apply formatting after iter_arrow to speed up format -> map, filter for iterable datasets
#7207
alex-hh
opened
1 month ago
16
Slow iteration for iterable dataset with numpy formatting for array data
#7206
alex-hh
opened
1 month ago
1
fix ci benchmark
#7205
lhoestq
closed
1 month ago
1
fix unbatched arrow map for iterable datasets
#7204
alex-hh
closed
1 month ago
1
with_format docstring
#7203
lhoestq
closed
1 month ago
1
`from_parquet` return type annotation
#7202
saiden89
opened
1 month ago
0
`load_dataset()` of images from a single directory where `train.png` image exists
#7201
SagiPolaczek
opened
1 month ago
0
Fix the environment variable for huggingface cache
#7200
torotoki
closed
1 month ago
4
Add with_rank to Dataset.from_generator
#7199
muthissar
opened
1 month ago
0
Add repeat method to datasets
#7198
alex-hh
opened
1 month ago
0
ConnectionError: Couldn't reach 'allenai/c4' on the Hub (ConnectionError)数据集下不下来,怎么回事
#7197
Mrgengli
opened
1 month ago
1
concatenate_datasets does not preserve shuffling state
#7196
alex-hh
opened
1 month ago
0
Add support for 3D datasets
#7195
severo
opened
1 month ago
3
datasets.exceptions.DatasetNotFoundError for private dataset
#7194
kdutia
closed
1 month ago
2
Support of num_workers (multiprocessing) in map for IterableDataset
#7193
getao
opened
1 month ago
1
Add repeat() for iterable datasets
#7192
alex-hh
opened
1 month ago
2
Solution to issue: #7080 Modified load_dataset function, so that it prompts the user to select a dataset when subdatasets or splits (train, test) are available
#7191
negativenagesh
closed
1 week ago
1
Previous
Next