nateraw / Lda2vec-Tensorflow

Tensorflow 1.5 implementation of Chris Moody's Lda2vec, adapted from @meereeum
MIT License
107 stars 40 forks source link

Create a Google CoLab Notebook for LDA2VEC-Tensorflow #52

Open dbl001 opened 5 years ago

dbl001 commented 5 years ago

Comments?

https://medium.com/deep-learning-turkey/google-colab-free-gpu-tutorial-e113627b9f5d

nateraw commented 5 years ago

Sure, great idea! I'm working with Google Colab for a new tutorial series I'm working on, so I'm already familiar with setting this up. Thanks for the suggestion!

dbl001 commented 5 years ago

I’ve tried the GPU and TPU runtimes. They disconnected after a short period of time. The documentation says you should be able to train a model for 12 hours.

https://stackoverflow.com/questions/55874473/does-google-colab-continue-running-the-script-when-runtime-disconnected

If the model checkpoint file is saved to the cloud drive, we should be able to reconnect, restore the model and continue training where we left off.

On Apr 30, 2019, at 12:27 PM, Nathan Raw notifications@github.com wrote:

Sure, great idea! I'm working with Google Colab for a new tutorial series https://github.com/nateraw/Keras-Tutorials I'm working on, so I'm already familiar with setting this up. Thanks for the suggestion!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nateraw/Lda2vec-Tensorflow/issues/52#issuecomment-488083429, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXWFW34CDWOPSK5473PW7TPTCMSVANCNFSM4HJO7IKQ.

dbl001 commented 5 years ago

Google CoLab runs out of memory pre-processing input files larger the ~20mb. Possibly in the Keras tokenizer.

nateraw commented 5 years ago

I wonder if the issue is from the skipgram token pair stuff. Will look into it. I have the colab example all cleaned up, I'll find time to upload it this week sometime.

dbl001 commented 5 years ago

It died in CoLab BEFORE printing: Removing 9981 low frequency tokens out of 15481 total tokens

On Jun 24, 2019, at 5:57 PM, Nathan Raw notifications@github.com wrote:

I wonder if the issue is from the skipgram token pair stuff. Will look into it. I have the colab example all cleaned up, I'll find time to upload it this week sometime.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nateraw/Lda2vec-Tensorflow/issues/52?email_source=notifications&email_token=AAXWFWY4VHQEJHCBQGA5PKTP4FUN7A5CNFSM4HJO7IK2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYOUNMA#issuecomment-505235120, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXWFW5JBBWLMTQRGJDDFRLP4FUN7ANCNFSM4HJO7IKQ.