issues
search
tony-framework
/
TonY
TonY is a framework to natively run deep learning frameworks on Apache Hadoop.
https://tony-project.ai
Other
708
stars
164
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
TonY Client allow users to specify jars to container runtime classpath
#632
zuston
closed
2 years ago
2
Add diagnostic msg with container host
#631
zuston
closed
2 years ago
0
Client exit when am reached startup-timeout
#630
zuston
closed
2 years ago
0
Add default jvm opts to tony jobtype and tony core
#629
UWFrankGu
closed
2 years ago
0
Add jvm arguments to application master container to mitigate log4j v…
#628
UWFrankGu
closed
2 years ago
0
Guava Conflict?
#627
kevincloutier
closed
2 years ago
2
Add support to mitigate log4j vulnerabilities from user imported log4…
#626
UWFrankGu
closed
2 years ago
0
Release TonY v0.4.11
#625
zuston
closed
2 years ago
1
Support setting Application Master node label
#624
zuston
closed
2 years ago
3
Make dependency-group-timeout check ignored until all tasks scheduled
#623
zuston
closed
2 years ago
0
Update README about sidecar tensorboard and dependency timeout mechanism
#622
zuston
closed
2 years ago
1
Make job fail when partial tasks' pre-dependent tasks finished and exceeds the waiting timeout
#621
zuston
closed
2 years ago
4
Task executors that support specific roles are restarted when they fail
#620
zuston
opened
2 years ago
2
Confusing of running tensorflow job with TonY on Apache Hadoop
#619
LanstonWu
closed
2 years ago
2
Make the conf of tensorboard-log-dir valid in tony-conf xml
#618
zuston
closed
2 years ago
0
Make diagnostic message with job type and index
#617
zuston
closed
2 years ago
0
Release TonY v0.4.10
#616
zuston
closed
3 years ago
0
Record RM callback nodes report when nodes updated
#615
zuston
closed
3 years ago
0
Reduce queue memory cost when event handler thread dont start
#614
zuston
closed
3 years ago
0
[DOC] Declare 'python_venv' and 'conf_file' options also support remote HDFS path
#613
zuston
closed
3 years ago
0
mnist_distributed.py example output no model data in the checkpoint folder
#612
junwei-h
closed
2 years ago
1
Hold rpc ports till server start
#611
zuston
closed
3 years ago
0
Introduce tony.application.x.untracked.timeout to solve partial jobs hang
#610
zuston
closed
2 years ago
6
Release TonY v0.4.9
#609
plliao
closed
3 years ago
1
Make untrackedTaskFailed volatile
#608
zuston
closed
3 years ago
2
Task failure handling mechanism: missed-heartbeat-failure is consistent with other failures
#607
zuston
closed
3 years ago
0
Should job fail fast when missing heartbeats ?
#606
zuston
closed
3 years ago
3
Pass secret keys from AM to containers to support Hadoop encryption
#605
helloworld1
closed
3 years ago
1
mnist_distributed.py example caused java.io.IOException: Disk quota exceeded
#604
junwei-h
closed
3 years ago
2
Introduce some job stop policies on TFRuntime
#603
zuston
closed
3 years ago
4
Release Version 0.4.8
#602
ashahab
closed
3 years ago
0
Can the training task run with docker container, and the hadoop cluster run without docker container?
#601
jxfruit
closed
2 years ago
7
Keep all resources close when client is killed
#600
zuston
closed
3 years ago
0
Prevent loss of root cause due to resetting the final state
#599
zuston
closed
3 years ago
0
TonyClient to create FileSystem from Path to support fully qualified HDFS path
#598
helloworld1
closed
3 years ago
7
Set job failed when runtime is not healthy
#597
zuston
closed
3 years ago
0
Rename tony.worker.timeout to tony.task.executor.execution-timeout-ms
#596
zuston
closed
3 years ago
0
Add regression test for issue #157
#595
nevesnunes
closed
3 years ago
2
Speed up ci test when AM crashed
#594
zuston
closed
3 years ago
0
[TEST] testAMCrashTonyShouldFail test case takes 7 minutes
#593
zuston
closed
3 years ago
0
[CI] Fix ci bug
#592
zuston
closed
3 years ago
2
[TEST-CI] Just trigger ci
#591
zuston
closed
3 years ago
1
Fixed checkstyle suppressions invalid problem on windows
#590
0Kelvins
closed
3 years ago
2
Introduce generic tony listener interface
#589
zuston
closed
2 years ago
2
Remove task from heart beat monitor when container finished
#588
zuston
closed
3 years ago
0
Add TonY license explanation
#587
oliverhu
closed
3 years ago
0
Update README.md
#586
oliverhu
closed
3 years ago
0
Update circleci link
#585
zuston
closed
3 years ago
0
Update README.md
#584
oliverhu
closed
3 years ago
0
Refactor tensorflow related class to tony
#583
zuston
closed
3 years ago
0
Previous
Next