petuum / adaptdl

Resource-adaptive cluster scheduler for deep learning training.
https://adaptdl.readthedocs.io/
Apache License 2.0
425 stars 76 forks source link

Progress in validation #122

Closed Rivendile closed 2 years ago

Rivendile commented 2 years ago

Hi, I'm trying to execute the simulator in branch "osdi21-artifact" and I encountered some problems.

What does the progress in the traces//validation-.csv mean? Does it mean the training time? I found that, the training time of ncf with batch size=32768 should be about 100s or even less as the step time are about 0.02s and the iteration is 1548. However, in validation-32768.csv, the progress is 194285.

Thanks!