learning-at-home / hivemind

Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.
MIT License
1.88k stars 143 forks source link

Roadmap #77

Open justheuristic opened 3 years ago

justheuristic commented 3 years ago

This is a global project roadmap that states our priorities for the nearest future. These priorities can and should be disputed here or elsewhere, after which we will update the roadmap.

v0.7 "It runs something" (released)

v0.8 "It runs at scale" (released)

v0.9 "It trains something" (released)

v0.10 "You can train with us" (released)

v1.0 "most of the code makes sense without reading the source" (nov-dec)

v1.1 "You can set up collaborative training easily"

Target scenario: 100 volunteers training 2xl-like over the internet

1.2 Decentralized Model-parallelism

Target scenario: 500 peers training 1B+ over the internet

Important, but not urgent

louis030195 commented 2 years ago

This sounds an interesting project, I like the idea of decentralized computing a bit like cryptocurrencies does, but not for currency, rather for general computing, because computing can be quite expensive. In my mind it would have looked a bit like a Kubernetes but decentralized, without any security issues regarding others' hardware access and probably based on trading computing for a kind of currency (yes still would be cheaper than current centralized computing clouds), though.

About the roadmap, I see exciting technical details, but I don't see how people will see and find themselves sharing their resources for a common goal? Is there any plan to develop a UI or something like?

Example: Bob and Alice want to train a GPT3 200B parameters, but Bob can only afford half the training price, same for Alice, but with this awesome UI, they could see that they match into a common goal.

borzunov commented 2 years ago

Hi @louis030195!

probably based on trading computing for a kind of currency

Yeah, there are a couple of projects related to this idea: vast.ai provides a service for users to lease/rent each other's GPUs, and BitTensor (cc @unconst) is built around a cryptocurrency serving as an incentive for people who help train models with their GPUs.

Currently, hivemind doesn't involve any financial incentives: we assume that volunteers are motivated by having access to the training outcome and recognition in the leaderboard. However, if time shows that the financial motivation is crucial, hivemind may serve as a backend for BitTensor nodes :)

I don't see how people will see and find themselves sharing their resources for a common goal? Is there any plan to develop a UI or something like?

For now, we assume this happens like this:

An example of such a page is our demo where we train a DALL-E-like model.

However, I definitely agree that our project will benefit from a centralized UI where a new user can see all planned/ongoing training runs and join the ones they consider interesting :)