sherjilozair / char-rnn-tensorflow

Multi-layer Recurrent Neural Networks (LSTM, RNN) for character-level language models in Python using Tensorflow
MIT License
2.64k stars 960 forks source link

Yet another take on tensorboard instrumentation plus extras. #89

Closed ubergarm closed 7 years ago

ubergarm commented 7 years ago

Dunno if you're accepting PRs / managing Issues, but just opened this anyway for other folks/forks that may be interested.

The second patch may address #34 regarding dropout:

python train.py --input_dropout=0.2 --output_dropout=0.5

Dropout defaults to off, 0.0, because it does slow down computations. Also, honestly, I don't know if I'm using it correctly. The tensorboard graph does seem to show it connecting between cell layers and not across, so that is promising.

I'll likely at least add some more logging stuff to write out the parameters as part of the directory name to compare more easily across runs. tensorboard demo video

Training Loss Across Multiple Runs Graph of NASCell w/ dropout

sherjilozair commented 7 years ago

@ubergarm do you want to perhaps help me maintain this repository by becoming a collaborator?

ubergarm commented 7 years ago

@sherjilozair I'm flattered, thanks for the offer! I'm very new to Tensorflow, but have experience doing general development.

I'm willing to commit some time to go through existing issues / pull requests to collate patches that look good into my fork. Then I can open a PR for you to review / discuss and merge to this upstream repo.

Does that sound like a reasonable workflow or what would you like?

PS Thanks so much for getting this ball rolling! I'm having a lot of fun training this against random data sets! A 12 character wide by 3 layer deep NASCell configuration did great on generating baby names after ~12 hours training on my single GefForce GTX 1070 w/ propriatiary nvidia drivers version 378.13 on Ubuntu 16.04 w/ 4.10 kernel.

hugovk commented 7 years ago

@ubergarm How about merging the PRs that look good directly into this repo? Might be a bit easier to do things in smaller chunks rather than one big one at the end.

sherjilozair commented 7 years ago

I am good for either collating or directly merging PRs, although the latter does seem better. The important thing is to make sure the PRs are high quality, are not introducing any new bugs, and have decent comments/notes to help other users. There are some small features people request as well, such as evaluation, etc. These could be very easily added to the repo.

@hugovk, would you be willing to become a collaborator as well?

If seems fitting, I would be happy to create an organization, and put this repo under the organization so that all collaborators can get equal credit.

ubergarm commented 7 years ago

@hugovk true, that is the best workflow, I'm just weary of over-committing myself for any longer term deal. I tend to be a "drive-by" contributor.

@sherjilozair Sure, I'd be down if say we three worked together to get this repo up to speed. Once we chew through the backlog it shouldn't be as daunting to maintain. ;)

Is there a good way to chat ever? I've never used https://gitter.im/ but it may make sense given the context.

sherjilozair commented 7 years ago

This whole repository started as a drive-by project. ;)

On second thought, it would be best that you work with whatever workflow that you like. You don't need to over-commit.

I just started a gitter channel: https://gitter.im/char-rnn-tensorflow/Lobby

ubergarm commented 7 years ago

@sherjilozair

So I made this set of patches a little less opinionated and modest. I'll keep using the save directory for the models and want to review some of the other dropout options before accepting the one proposed by me.

Final patch changes are:

  1. Update MultiRNNCell to use list of cells (removed explicit state_is_tuple=True)
  2. Added NASCell
  3. Add instrumentation for tensorboard logs
  4. Add notes to README about using tensorboard
sherjilozair commented 7 years ago

Looks great, @ubergarm !