spacemeshos / SMIPS

Spacemesh Improvement Proposals
https://spacemesh.io
Creative Commons Zero v1.0 Universal
7 stars 1 forks source link

SMIP-0005: Variable GPU Post - User Interactions #7

Open avive opened 4 years ago

avive commented 4 years ago

Core Interactions

POST SETUP (GUI)

Post commitment Init via a gui app such as Smapp.

Step 1 - Commitment Location

Visual Design Mock

Step 2 - Commitment Size

Visual Design Mock

Step 3 - GPU Selection

Visual Design Mock

Post Init Setup

The app now has all the user information to start computing the commitment. The app obtains the smesher id currently set in the node, the current net id and any other param needed to init post for a smesher via the full node api and it passes it to the post utility process start api method. User is notified that the init process has started and that he can check progress in the same screen. e.g. Smesher screen. The app should also send a desktop notification when post setup init is complete.

If user chose to pause init when he's interactive, executions of an post init via the gpu-post api should only happen after a period of 5 minutes of user idle time. Otherwise, the process should not consider user idle time.

POST SETUP (CLI)

User uses a CLI api client (such as the CLIWallet) to setup post on his full node using the Smesher api. The post init setup outputs progress to full node logs and also provide streaming grpc api with progress events. User is using these capabilities to monitor the post init progress.


Additional Use Cases

Modify POST commitment size

Using GUI

  1. User selects modify post data command from the smesher screen.
  2. User selects to modify post size by clicking resize data.

  1. The modify post size screen is displayed.

  1. The interaction continues in a similar way to the new post data setup interaction - user selects new size which can be smaller or bigger than the current post data size, selects a processor for computation, etc....

Stop Smeshing and delete POST commitment file

Using GUI

  1. User selects modify post data command from the smesher screen.
  2. User selects to delete the data by clicking delete data.
  3. The post data is deleted and the node should stop smeshing.

User can start smeshing at any time by setting up new post data from the smesher screen. When post is not setup the smesher screen prompts the user to setup the data:

Using a CLI API Client.

Review post data creation status

Using GUI

  1. User accesses the smesher screen.
  2. The post data creation status is displayed.

avive commented 4 years ago

@noamnelke @moshababo - as discussed - open for comments now as issue before moving to be a smip.

avive commented 4 years ago

@moshababo - we talked about it briefly. Here I assume post-init process is going to be packaged as a stand-alone process which exposes an IPC API. I can't find a product reason to bundle the post init utility in go-spacemesh. This is tight-coupling post init to a full node client and arguably a mix of separate concerns. Having a separate post init utility will also make it easier to test.

moshababo commented 4 years ago

There are a few different considerations here:

  1. The PoST util cannot be concerned solely with the PoST init, but also with generating and validating proofs. Otherwise, this layer will need to be implemented in the full node client, which doesn’t make sense IMO because it will result in not having a single self-contained util which we can sufficiently maintain/test/ship (like the current Go/CPU-based PoST util). As discussed, this layer need to be re-written due to the new tree-free construction, and will be implemented in Rust (thus gaining an easy integration with post-gpu, among other benefits).

  2. App communication with the PoST util: bypassing the full node client doesn’t make sense to me. The App isn’t a prerequisite for using the full node client, so we’ll need to implement the communication in the full node client anyway, and make it support 2 different use-cases. Since PoST init and mining in general are part of the full node client responsibilities, I think the App should communicate only with the full node client.

  3. It might make sense to communicate with the PoST util via IPC instead of in-process shared object linkage, but mainly for being able to utilize a remote machine resources. The testing setup will actually become more difficult, because we'll need to build/spawn/connect to a separate process (while testing the PoST util in isolation is easy regardless of its API technicalities). Also, using IPC for proofs validations could be problematic, so we might potentially want to go hybrid. I'd say that not having to use cgo is an advantage, but we'll need it for svm anyway, so it doesn't count. As discussed, I think that the PoST util should support both modes anyway (like the current util).


Other remarks:

  1. Current PoST util supports splitting the init data into multiple files. I think we should keep supporting this feature. Having hundreds-of-GB-sized file could introduce difficulties to the user.

  2. Current PoST util supports init recovery. I think we should keep supporting this feature.

  3. Current PoST util supports initializations for multiple smesher ids. I think we should keep supporting this feature (If so, deleting the PoST init files requires to maintain the respective smesher id).

  4. Deletion of the PoST file/s should not be done via OS, but via the PoST util API.

  5. AFAIK the PoST isn’t concerned with the net id, but only with the smesher id.

noamnelke commented 4 years ago

Product comment: making users select where the PoST work will be done (GPU/CPU) is a bad idea, IMO. What you describe as the default is the correct choice for 99.99% of users. I suspect that the 0.01% who would need control over this won't use the GUI anyway, but you're now forcing everyone to make a choice that some people will get stuck on because they don't understand and others to just need to press "next" one more time. If you think it's critical to let users control this from the GUI, don't make it a part of the normal flow, put it somewhere under "advanced" or something (I think it's not worth implementing, even there, because nobody will ever use it).

Agree with everything @moshababo said, except for the last point - we need to add the net id to the hashed message (what's currently only the miner id) to block miners from re-using init files across different networks.

avive commented 4 years ago

@moshababo

  1. The PoST util cannot be concerned solely with the PoST init, but also with generating and validating proofs. Otherwise, this layer will need to be implemented in the full node client, which doesn’t make sense IMO because it will result in not having a single self-contained util which we can sufficiently maintain/test/ship (like the current Go/CPU-based PoST util). As discussed, this layer need to be re-written due to the new tree-free construction, and will be implemented in Rust (thus gaining an easy integration with post-gpu, among other benefits).

Fair argument although there are some pos to have PoST init utility separate and have proof creation and verification embedded in the node. So it looks like we are going with PoST util embedded in go-spacemesh.

  1. App communication with the PoST util: bypassing the full node client doesn’t make sense to me. The App isn’t a prerequisite for using the full node client, so we’ll need to implement the communication in the full node client anyway, and make it support 2 different use-cases. Since PoST init and mining in general are part of the full node client responsibilities, I think the App should communicate only with the full node client.

Yes - via the node's new proposed Smesher API. Please see what's been drafted here: https://github.com/spacemeshos/api/tree/smeshing-methods/proto/spacemesh

  1. It might make sense to communicate with the PoST util via IPC instead of in-process shared object linkage, but mainly for being able to utilize a remote machine resources. The testing setup will actually become more difficult, because we'll need to build/spawn/connect to a separate process (while testing the PoST util in isolation is easy regardless of its API technicalities). Also, using IPC for proofs validations could be problematic, so we might potentially want to go hybrid. I'd say that not having to use cgo is an advantage, but we'll need it for svm anyway, so it doesn't count. As discussed, I think that the PoST util should support both modes anyway (like the current util).

ok. So we recommend full embedding of PoST Util (Init, generate proof, verify proof) in go-spacemesh.

Other remarks:

  1. Current PoST util supports splitting the init data into multiple files. I think we should keep supporting this feature. Having hundreds-of-GB-sized file could introduce difficulties to the user.

Of-course we should support this. User specified only 1 path on a volume where to create it. However, the PoST util needs to be smart enough to delete all generated PoST files in that path when requested to delete a commitment.

  1. Current PoST util supports init recovery. I think we should keep supporting this feature.

Please explain what init recovery means. We definitely want to enable continuing commitment file creation that was stopped for any reason. Is this what you mean by recovery?

  1. Current PoST util supports initializations for multiple smesher ids. I think we should keep supporting this feature (If so, deleting the PoST init files requires to maintain the respective smesher id).

What is the use case of 1 commitment supporting multiple smesher id? Is there really a reason not to tie a commitment to a specific smesher id and network id? I see no such use case and unless there's a good user story here that gives users value we should not support it. Regarding Smesher ID - this concern should be separated from from an actual post commitment file and generation step. Right now they are mixed with each other and we have several bugs in this area such as a new smesher id automatically being created by the node if there's a problem with the init file. The Smesher id is a node concept and should be set and maintained separately from the existence or the state of a PoST commitment file(s) for that smesher id. So, for example, a smesher should be able to create a new PoST commitment if one got corrupted without having to create a new Smesher id. Also relevant to the stop and start smeshing API capabilities we want to have...

  1. Deletion of the PoST file/s should not be done via OS, but via the PoST util API. Of course - see proposed API linked above to deal with commitment. Nothing should be done via the OS - only via the API but the PoST component need to know to deal with unavailable commitments - due to os changes. e.g. user forgets to turn on an external USB drive with a post commitment when trying to smesh...

  2. AFAIK the PoST isn’t concerned with the net id, but only with the smesher id. As Noam commented, we definitely want to tie a net id with a PoST together with the Smesher ID so they can't be reused across different networks.

avive commented 4 years ago

Product comment: making users select where the PoST work will be done (GPU/CPU) is a bad idea, IMO. What you describe as the default is the correct choice for 99.99% of users. I suspect that the 0.01% who would need control over this won't use the GUI anyway, but you're now forcing everyone to make a choice that some people will get stuck on because they don't understand and others to just need to press "next" one more time. If you think it's critical to let users control this from the GUI, don't make it a part of the normal flow, put it somewhere under "advanced" or something (I think it's not worth implementing, even there, because nobody will ever use it).

Reducing user friction and unnecessary steps to setup post is critical and we should certainly be thinking about it. That said, I wish it was possible in this case but I'm afraid it is not. Here's why:

A typical desktop or laptop will have 3 processors that are able to init post, including the main system CPU (slow fallback) - all desktops ship with internal graphics and modern gaming PCs have for example capable (but quite weak) intel GPUs on the motherboard which we support. MacBook pros also have 2 GPUs - one for the internal screen and one for external screens - we support both.

We can't easily figure out, as far as I can tell which GPU enumerated by the system will yield better performance. The dedicated GPU in desktops - eg. an AMD or Nvidia card will yield the best results over the built-in motherboard GPU. In addition, users may want to use the slower motherboard GPU as this will let them have reasonable performance while allowing them to play 60fps using the main GPU - definitely a strong use case.

So, due to the above I believe we should ask the user what processor to use for PoST. If we can find a way to figure out the most performing one (more research needed) then we should consider making it the default in the app for ease of use and not bother the user with other options - but when setting post via a node only setup via a CLI wallet API client or via node startup flags directly - we need to let the user decide which processor to use. So bottom line, we need to support this selection feature regardless of the app configuration. @noamnelke - makes sense?

y0sher commented 4 years ago

Product comment: making users select where the PoST work will be done (GPU/CPU) is a bad idea, IMO. What you describe as the default is the correct choice for 99.99% of users. I suspect that the 0.01% who would need control over this won't use the GUI anyway, but you're now forcing everyone to make a choice that some people will get stuck on because they don't understand and others to just need to press "next" one more time. If you think it's critical to let users control this from the GUI, don't make it a part of the normal flow, put it somewhere under "advanced" or something (I think it's not worth implementing, even there, because nobody will ever use it).

I agree with this product wise. maybe we can show the minimal "Next" screens we can where we actively ask the user to choose something, then show a detailed screen about whats going to happen like a list ex:

PoST
- Directory: ~/post/
- Device: GPU (AMDGPU)
- Size: 250GB

and advanced users will be able to change it there.. (maybe add some little (i) icon to explain each of these). from what you said I assume we are going to list them anyway, so we can probably make some "stupid" assumption while listing all processors and take amd/nvidia as default then fallback to intel gpu if exist then fall back to cpu.

noamnelke commented 4 years ago

@avive we can do a 2 second benchmark and figure out which device is fastest.

noamnelke commented 4 years ago

I think you underestimate how easy it is to get users to stop a setup process.

"Hmm... what should I do here? Nevermind, I'll go play Fortnight instead"

avive commented 4 years ago

@avive we can do a 2 second benchmark and figure out which device is fastest.

Yes, this is a good idea which will allow us to select the fastest option by default so we don't need to bother the user with it, at least when we use a high-level api but we do want to give smeshers full control over this as well in some cases. e.g. use my motherboard GPU so I can keep playing at 60fps while post is being done reasonably fast...

avive commented 4 years ago

I think you underestimate how easy it is to get users to stop a setup process.

"Hmm... what should I do here? Nevermind, I'll go play Fortnight instead"

I don't really - I want them to play Fortnite while their computer is creating a commitment - how cool is that?

noamnelke commented 4 years ago

please, please put it under "advanced", or as @y0sher suggested, show the user what we chose for them and let them "customize" it.

noamnelke commented 4 years ago

I don't really - I want them to play Fortnite while their computer is creating a commitment - how cool is that?

I think you totally missed my point. Confronted with a hard choice (e.g. one they don't understand) users will opt to do nothing (stop the setup process).

avive commented 4 years ago

please, please put it under "advanced", or as @y0sher suggested, show the user what we chose for them and let them "customize" it.

Yes, if we can benchmark then we can choose the best one as default and let advanced users changes. Please also see the commitment related methods in the proposed smesher api (wip / pre design review): https://github.com/spacemeshos/api/blob/smeshing-methods/proto/spacemesh/smesher.proto - to have most flexibility in higher level api client design we need to support what's proposed on the basic api level...