Open TheClassyPenguin opened 4 years ago
@shawwn I also would love to know how to get --storage_bucket in and working
@Norod @TheClassyPenguin: I've looked into this a bit personally, and it seems this is not something the code itself imposes, it's due to a limitation of using a TPU on Colaboratory. For some reason, Colaboratory makes you store models on a Google Cloud Storage bucket (see here for info on creating one) in TPU runtimes.
It appears that someone may have found a way around it, though I haven't had time to confirm this myself. Placing it here for completeness only.
Hope that was helpful in explaining what --storage_bucket
is for!
@JaonHax Thank you for the reply, but I'm afraid there has been a misunderstanding. I know what '--storage_bucket' is for and that's why I'd love to have a version where it is implemented in the code. Currently '--storage_bucket' is only mentioned as a comment. Other than that, I did not encounter any issues with storing the checkpoints locally in colab's local storage, so from time to time I just need to remember to stop the execution and manually copy the most recent checkpoint into a bucket (so not to loose progress when it resets). Having '--storage_bucket' support will make it all more convenient.
@Norod Ah, yeah. I hadn't realized at the time because I was still using the tpu-multi-snapshot branch, which does have it implemented. Sorry for not understanding what you meant!
Hi,
I'm training on the 1588M model on gcp. I see that your notebook mentions a parameter by the name of --storage_bucket.
I need the feature and I'm in the position to test it out but I've seen the code for it is not in the train.py file. Is it something you implemented and ultimately decided not to include as you couldn't test it yourself?
Let me know if you have the code!