Open pablogranolabar opened 2 years ago
Hi @pablogranolabar, tweaks will be needed, but it can be made possible.
Should consider the following parts:
DEBUG
option, e.g.,: https://github.com/saareliad/FTPipe/blob/c3d853080e0bebde50deef78892baf0f3663daf1/models/partitioned/t5_3b_tied_lmheads_320_8_8p_bw12_async_squad1_mpipe.py#L45"cpu": true
to the json config https://github.com/saareliad/FTPipe/blob/c3d853080e0bebde50deef78892baf0f3663daf1/pipe/prepare_pipeline.py#L302 I kept a file with all options here, e.g., https://github.com/saareliad/FTPipe/blob/c3d853080e0bebde50deef78892baf0f3663daf1/pipe/configs/all_options.json#L67Finally, there are some partitioning heuristics which would need to be changed according to your system, e.g., memory threshold in the master branch is hardcoded to 11GB for RTX2080ti: https://github.com/saareliad/FTPipe/blob/c3d853080e0bebde50deef78892baf0f3663daf1/autopipe/autopipe/model_partitioning/heuristics.py#L327
Hi, very neat project.
Question: is it possible to use FTPipe with massively parallel CPU clusters? Say for example 256 VMs?