issues
search
sql-machine-learning
/
elasticdl
Kubernetes-native Deep Learning Framework
https://elasticdl.org
MIT License
733
stars
113
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Remove unnecessary Worker param
#2499
zuston
closed
3 years ago
5
Delete the codes to download heart dataset
#2498
workingloong
closed
3 years ago
0
Fix the counter if the worker retry to aggregate gradient in ElasticDL
#2497
workingloong
closed
3 years ago
0
Fix the bug if the host of a new worker is the same as an old worker
#2496
workingloong
closed
3 years ago
0
Warp the job command using parentheses.
#2495
workingloong
closed
3 years ago
0
Worker can report training data to the master if using RecordIO
#2494
workingloong
closed
3 years ago
0
Sync
#2493
workingloong
closed
3 years ago
0
Go PS grpcio version too low
#2492
skydoorkai
opened
3 years ago
0
merge develop to v0.2.0 before disable go ps test
#2491
skydoorkai
closed
3 years ago
0
upgrade grpcio version to 1.34.1
#2490
skydoorkai
closed
3 years ago
0
Remove "elasticdl-" prefix to ps/worker pod name
#2489
skydoorkai
closed
3 years ago
0
Develop an API to get training epoch
#2488
workingloong
closed
3 years ago
0
Check whether to register hooks according to HOROVOD_ELASTIC
#2487
workingloong
closed
3 years ago
0
Create an ElasticImageFolder for PyTorch.
#2486
workingloong
closed
3 years ago
0
Relaunch worker on failure
#2485
skydoorkai
closed
3 years ago
0
Call model.cuda if there is a cuda device
#2484
workingloong
closed
3 years ago
0
add pod status change log
#2483
skydoorkai
closed
3 years ago
0
Implement the fail fast mechanism of master. (#2480)
#2482
brightcoder01
closed
3 years ago
0
Add cluster_spec_json in EXCLUDE_PRINT_ARGS (#2479)
#2481
brightcoder01
closed
3 years ago
0
Implement the fail fast mechanism of master.
#2480
brightcoder01
closed
3 years ago
0
Add cluster_spec_json in EXCLUDE_PRINT_ARGS
#2479
brightcoder01
closed
3 years ago
0
Bump version
#2478
brightcoder01
closed
3 years ago
0
Update package version to 0.2.0rc5.
#2477
brightcoder01
closed
3 years ago
0
Add num_minibatches_per_shard param in the factory method.
#2476
brightcoder01
closed
3 years ago
0
Add num_minibatches_per_shard param in report_training_params (#2473)
#2475
brightcoder01
closed
3 years ago
0
Update the version releasing doc.
#2474
brightcoder01
closed
3 years ago
0
Add num_minibatches_per_shard param in report_training_params
#2473
brightcoder01
closed
3 years ago
0
Add the argument to shuffle shards.
#2472
workingloong
closed
3 years ago
0
Add more logs for task_manager.
#2471
brightcoder01
closed
3 years ago
0
The event type of PodStateFlow contains ADDED and MODIFED
#2470
workingloong
closed
3 years ago
0
Support shuffling the total dataset.
#2469
workingloong
closed
3 years ago
0
Master raise a runtime error if all workers failed
#2468
workingloong
closed
3 years ago
0
Populate the environment variables matched with the input args from master to the created pods.
#2467
brightcoder01
closed
3 years ago
0
Add the PodStateFlow from PENDING to SUCCEEDED and FAILED.
#2466
brightcoder01
closed
3 years ago
0
Add elasticdl job arguments only when need_elasticdl_job_args=True
#2465
skydoorkai
closed
3 years ago
0
Don't print some arguments.
#2464
brightcoder01
closed
3 years ago
0
Read indices from a shard
#2463
workingloong
closed
3 years ago
0
create worker service and set TF_CONFIG env when needed
#2462
skydoorkai
closed
3 years ago
0
Enable the worker image configuration using elasticdl_client.
#2461
brightcoder01
closed
3 years ago
0
Add more parameters in the factory method of data_shard_service.
#2460
brightcoder01
closed
3 years ago
1
Fail to fetch shard using multi-process in Python.
#2459
workingloong
closed
3 years ago
0
Fix the warning message in the master
#2458
workingloong
closed
3 years ago
0
Only use cluster_spec or cluster_spec_json
#2457
workingloong
closed
3 years ago
0
Set the default value of need_elasticdl_job_service to False
#2456
workingloong
closed
3 years ago
0
Make data_shard_service to be compatible with various task types.
#2455
brightcoder01
closed
3 years ago
0
Fix the shard name in the unittests of odps
#2454
workingloong
closed
3 years ago
0
Add a lock to get task
#2453
workingloong
closed
3 years ago
0
Add factory methods for data_shard_service and master_client.
#2452
brightcoder01
closed
3 years ago
0
The worker sends the start and end message to the master.
#2451
workingloong
closed
3 years ago
0
Fix the interval to retry to get the rank
#2450
workingloong
closed
3 years ago
0
Previous
Next