Open arjunsuresh opened 2 months ago
@psyhtest @ashwin @attafosu Can you please confirm?
@arjunsuresh Yes, that's correct.
I can think of a situation when an implementer refactors/integrates a reference script into their own script. For example, the reference script may hardcode using /usr/bin/python3
, while they may want to use /usr/local/bin/python3.8
. In this case, we can probably request that no material changes should be done during such refactoring/integration, but not that the reference script must always be run stand alone?
Thank you @attafosu @psyhtest
@psyhtest yes, running the reference accuracy script standalone is fine I believe. But this is not that straightforward as it often requires the original dataset and so we do have some submissions where accuracy.txt is generated from the benchmark run itself without calling the reference script. We didn't see any accuracy issue when running the standalone script for those submissions, but I believe this should not be allowed.
@arjunsuresh
But you admit that in some cases it may not be straightforward:
yes, running the reference accuracy script standalone is fine I believe. But this is not that straightforward
So why would we disallow it in such cases?
@psyhtest I'm not telling to disallow running the reference accuracy script in a custom way - say like within another python file. But I don't think it is right to allow generation of the accuracy.txt file by mimicking the actions of the reference script - because it becomes hard to verify this for other people.
We face this issue specifically for automating DLRMv2 submissions where to generate the accuracy.txt file we need the day23 criteo dataset which is not possible to be downloaded in an non-interactive way. But if we are allowed to generate the accuracy.txt file from within the benchmark implementation we possibly do not need this file at all.
@arjunsuresh to work on this
The submission generation rules for inference says that the
accuracy.txt
file should be generated from the accuracy scripts. My interpretation of this is that one should run the reference accuracy scripts stand alone using the logs from the accuracy run and obtain this accuracy.txt file and not dump the accuracy.txt file with in the implementation code. Is this the correct interpretation?