issues
search
modelscope
/
data-juicer
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
Apache License 2.0
2.63k
stars
166
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[Bug]:
#441
FailedNamed
opened
1 week ago
0
[Bug]: KeyError: 'resource'
#440
luckystar1992
opened
1 week ago
0
fix error links
#439
yxdyc
closed
1 week ago
0
[Bug]: Paper link error
#438
ForeverNewLee
opened
1 week ago
0
[Bug]: JupyterLab Official sample error
#437
Night-Quiet
closed
1 week ago
2
fix check_model
#436
Cathy0908
closed
1 week ago
0
Refine batch op branch
#435
BeachWang
closed
1 week ago
0
doc update for sandbox paper
#434
BeachWang
closed
1 week ago
0
Require fps filter and mapper for videos
#433
BeachWang
opened
2 weeks ago
0
支持RangeSpecifiedFieldSelector使用指定字段的值域进行数据选择
#432
2108038773
opened
2 weeks ago
1
Service/match api
#431
BeachWang
opened
2 weeks ago
0
why often happen: One of the subprocesses has abruptly died during map operation?
#430
strongcc
opened
3 weeks ago
4
Feat/dj adapter
#429
HYLcool
closed
2 weeks ago
0
Service/match api
#428
BeachWang
closed
3 weeks ago
0
Fix some words
#427
co63oc
closed
3 weeks ago
1
Regress model preloading
#426
drcege
closed
3 weeks ago
0
执行 python tools/process_data.py --config train.yaml 命令
#425
abchbx
closed
3 weeks ago
1
fix param definition
#424
Cathy0908
closed
3 weeks ago
0
Add new OP: image_tagging_mapper
#423
HYLcool
closed
3 weeks ago
0
use pydantic types
#422
drcege
closed
3 weeks ago
0
fix: missing args in load_formatter of Analyzer
#421
zhijianma
closed
4 weeks ago
2
AssertionError
#420
abchbx
closed
4 weeks ago
1
[Bug]: undefined symbol: _ZN3c104cuda9SetDeviceE
#419
lh61500
closed
1 day ago
3
*quick fix*: NestedDataset
#418
HYLcool
closed
1 month ago
0
[Feat] Data-Juicer as a Service
#417
drcege
closed
1 week ago
3
[Feat] Enhance type hints and parameter validation
#416
drcege
closed
3 weeks ago
1
Automatically split input dataset in ray mode
#415
pan-x-c
closed
1 week ago
2
Add lazy import and auto-install dependencies
#414
BeachWang
closed
1 week ago
2
[Feat] Support `dj_batched_group_ops` that allows for the configuration and application of multiple operators in smaller, manageable batches
#413
yxdyc
closed
1 week ago
2
[Feat] Support `PythonCodesOperator` and `BashCodesOperator` that wraps an existing python file, or some code snippets to be executed, such as the existing DJ tools.
#412
yxdyc
closed
1 week ago
2
Guidance for OP with multiple data fields to be processed
#411
yxdyc
closed
1 week ago
2
Use analyzer instead of analyser to maintain consistency
#410
garyzhang99
closed
1 month ago
0
analyzer or analyzer?
#409
lilqz66
closed
1 month ago
1
[WIP]Add text tagging by prompt mapper op
#408
garyzhang99
closed
14 hours ago
2
rename to fix typo in test_expand_macro_mapper.py
#407
garyzhang99
closed
1 month ago
0
support batch_size>1 for some operators
#406
Cathy0908
closed
2 weeks ago
1
Add text_pair_similarity_filter
#405
Qirui-jiao
closed
1 week ago
2
什么鬼呀,不管是你们huggingface空间还是自己起个服务都运行不起来,demo也运行不起来
#404
coder4nlp
closed
1 month ago
2
Update the KDD tutorial info
#403
yxdyc
closed
1 month ago
0
Add turbo mode
#402
drcege
closed
3 weeks ago
0
Add sentence_augmentation_mapper
#401
Qirui-jiao
opened
1 month ago
2
Add mllm_mapper
#400
Qirui-jiao
closed
14 hours ago
3
Enhance/ckpt
#399
drcege
closed
1 month ago
0
Heavy dependency of Data-Juicer
#398
BeachWang
closed
1 week ago
4
update spacy to deal conflict with ms-swift
#397
BeachWang
closed
1 month ago
4
Enhance/ckpt
#396
drcege
closed
1 month ago
0
Add sdxl_prompt2prompt_mapper
#395
Qirui-jiao
closed
1 week ago
2
Add segment_mapper
#394
Qirui-jiao
opened
1 month ago
2
Add image_pair_similarity_filter
#393
Qirui-jiao
closed
4 weeks ago
0
Fix spelling errors in documentation
#392
TobyJasper
closed
1 month ago
0
Next