rjagerman / glint

Glint: High performance scala parameter server
MIT License
168 stars 67 forks source link

Yarn support #69

Open batizty opened 7 years ago

batizty commented 7 years ago

Hi rjagemen,

Could you please help me to review the request?

All codes are tested on online in my cluster environment.

Any question is welcome and appreciate your previous work.

Thanks

batizty commented 7 years ago

@rjagerman Could You Please help me to review the change. Thanks

rjagerman commented 7 years ago

Hi @batizty,

Thanks! This looks really nice! I haven't had the time yet to review it due to several projects and deadlines at work. I hope to review it some time next week.

batizty commented 7 years ago

Hi @rjagerman,

Understand.

And feature for yarn support is used in weibo.com(Maybe you have heard about this web site, maybe not, and it is top 5 website in China, similar twitter with more users in China). And it works well.

And I also developed some other features on Glint, which includes additional operations like Save and Load which could used to store and read quickly models in HDFS, and I believe it is useful for most of Glint Users who are working on Big Vector and Matrix Machine Learning.

If could, I wanna to be an contributor for Glint because it is very simple and stable for large scale Machine learning.

Thank you for your work on Glint.

rjagerman commented 7 years ago

Still haven't found the time to do it, too many deadlines unfortunately :-( I'll let you know when I get around to it.

batizty commented 7 years ago

Got it.

later I will send out another patch for Glint, which could be used to store all parameters into HDFS by nodes independently. And I have tested before, if you want to pull all weight vector/matrix which sizes is over 100m, it took about more than 30min. And I add an operation 'Save' to store the weights in parameter nodes, fortunately it took me less than 1min. I believe it is useful for others who will work on huge models.

Thanks.

baukloze commented 6 years ago

Hi, @batizty I want to use Glint to store weights for machine learning algorithms, but it's too difficult to save weights to local file or hdfs file. fortunately, i found that you had met this problem and solved it, could you please send out your branch? Thanks.

batizty commented 6 years ago

Hi, @baukloze Sorry, I forgot this issue.

And could you please wait one or two days, I will send out my modification ASAP. Hope you like it.

By the way, @rjagerman my workmates and i have implemented basic ML algorithms based on Glint, but it is not stable enough now. When our data size reached to 1000B, and the matrix/vector width reached 500B, a lot of traffic load will cause some of AKKA nodes became Quarantined State. Any Suggestion or method to fix this problem?

baukloze commented 6 years ago

@batizty ok, thanks.