Currently, each job is locked to 1 compute provider so the system is optimized for running multiple concurrent jobs on systems with multiple providers. We would like to add support for tasks that use all available GPUs in a system and so 1 proof of space can be created more quickly in a multiple providers system.
Outline
Design and implement a new task which locks all available providers when started and distributes the leaves generation and pow solution computation between the locked providers.
Requirements
Leaves in different proof of space data indexes are computed in parallel and although each computation iteration takes roughly the same time, the order of computation iteration completion is indeterministic. Task needs to correctly serialize iterations output to pos file in the correct index. One possible design is to append to an output file in the correct index in a serialized way. A block of leaves with all 0s may need to be appended when a block of leaves with is available before a block of leaves in a smaller index so the data can be persisted in the correct index.
POW Solution generation
The proof of work solution generation execution phase (when applicable) should also be parallelized between all locked providers and new compute iterations should not be started once a solution is found.
Performance
Write leaves to the data file in an async way so new compute iterations on providers are not blocked on disk i/o. The ideal performance is that all compute providers are working at capacity for the duration of the data file generation and idle time is negligible.
Motivation
Currently, each job is locked to 1 compute provider so the system is optimized for running multiple concurrent jobs on systems with multiple providers. We would like to add support for tasks that use all available GPUs in a system and so 1 proof of space can be created more quickly in a multiple providers system.
Outline
Design and implement a new task which locks all available providers when started and distributes the leaves generation and pow solution computation between the locked providers.
Requirements
Leaves in different proof of space data indexes are computed in parallel and although each computation iteration takes roughly the same time, the order of computation iteration completion is indeterministic. Task needs to correctly serialize iterations output to pos file in the correct index. One possible design is to append to an output file in the correct index in a serialized way. A block of leaves with all 0s may need to be appended when a block of leaves with is available before a block of leaves in a smaller index so the data can be persisted in the correct index.
POW Solution generation
Performance