Open tfgjustin opened 1 year ago
I would be more than happy to be proven wrong since this looks to be honestly pretty amazing, but sadly the track record with the rest of the repo here means I'm not holding out much hope.
(P.S. Please prove me wrong.)
Have you been able to get any results period? I have been able to finally get results and benchmarks working beautifully with some minor config changes and ensuring all the paths are correct.
@tfgjustin
We just released a properly dockerized training benchmark for CV models if you're interested: https://github.com/tensorpix/benchmarking-cv-models
You just pull the repo and run the container... it shouldn't take more than 5 minutes to figure everything out.
Thank you! I'll check it out tomorrow!
If you're looking at this repository and wondering if it's the right thing for you, the short answer will be "no". The TL;DR is that unless you're an employee of Lambda Labs, have your machine and directory structure set up identically to theirs, and have already downloaded all the data sets, it will not work.
Longer explanation: Amazingly the creators of this tool have managed to recreate the "works for on my machine" using Docker containers and volumes, which somewhat runs counter to the concept of containers, but is particularly impressive.
Among the issues found in the tutorial and their scripts:
~/data
. If you already have this directory, hold onto your hat because it will start writing random stuff as root.set -ex
on several of the scripts helps with this problem.)download_dataset.sh
, but that script is not compatible with that dataset. You'll get:--progress=dot
for wget, meaning you'll get literally thousands of lines of:in your terminal. See bullet point (3) about this immediately hiding any previous error messages.
Given the fact that no one from Lambda Labs has attempted to address any of the bugs or PRs raised here in the last three years, your best bet is to move on and find a different benchmark.