It will be useful for us and for people using the repository if I document the time / space it took to run it on different dataset,
so they will be able to approximately estimate how much time it is going to take them.
specify hardware used (RAM, disk, #cups, which types of CPUs, OS version)
and the dataset used (size, n_tokens)
time for the different parts
sort of benchmarking.
Need to figure out a good way to document this and add to readme.
***UPDATE 21.06.2023
Add a section to the README.md with performance, and decide how do I want to organize this information.
Add support in the logging for recording required information for the performance report
test the thing on bookcorpus medium
***UPDATE 22.06.2023
the template was created, and logging messages were added to collect relevant information.
Now I'm waiting on bookcorpus_medium experiment to run so I can check if the information is extracted correctly.
It will be useful for us and for people using the repository if I document the time / space it took to run it on different dataset, so they will be able to approximately estimate how much time it is going to take them.
sort of benchmarking.
Need to figure out a good way to document this and add to readme.
***UPDATE 21.06.2023
***UPDATE 22.06.2023 the template was created, and logging messages were added to collect relevant information. Now I'm waiting on bookcorpus_medium experiment to run so I can check if the information is extracted correctly.