pytorch / tutorials

PyTorch tutorials.
https://pytorch.org/tutorials/
BSD 3-Clause "New" or "Revised" License
8.13k stars 4.04k forks source link

Updating "Classifying Names with a Character-Level RNN" #2954

Open mgs28 opened 3 months ago

mgs28 commented 3 months ago

Fixes #1166

Description

Updating Sean's excellent RNN classification tutorial that is now 8 years old and missing some newer pytorch functionality.

Checklist

cc @albanD

pytorch-bot[bot] commented 3 months ago

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/2954

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot commented 3 months ago

Hi @mgs28!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

mgs28 commented 3 months ago

@svekars - it looks like you are active a lot in this repo, any chance you could help me with this? Thanks!

mgs28 commented 3 months ago

Added functionality to process training data in mini batches to satisfy original story. However, had to use numpy + random to create batch indices from a given dataset.

Also, simplified training so it was a closer match to https://pytorch.org/tutorials/beginner/basics/optimization_tutorial.html

mgs28 commented 2 months ago

@svekars - would you please help me with this tutorial update pull request or point me to someone who could?

svekars commented 2 months ago

cc: @spro

mgs28 commented 2 months ago

Sorry about the spelling errors! I ran pyspelling and re-ran make html for the tutorial. This should pass those CI steps now.

I also added a story for me to come back and update the CONTRIBUTING.md to include some of these checks. (https://github.com/pytorch/tutorials/issues/2969)

Thanks @spro @svekars !

mgs28 commented 2 months ago

@spro and @svekars - I significantly cut the training time although it is faster on my CPU than GPU. It runs in 72 seconds on my local CPU. I also added some default device so it looks for your CUDA build machines to hopefully make it faster.

Thanks!

mgs28 commented 2 weeks ago

@jbschlosser - thank you for the lovely suggestions to improve. If possible, I'd like to split into two things:

First, the edits to my existing content.

  1. Excellent point on NameData. I removed it and used helper functions.
  2. I'm glad you like the Dataset addition - that was my prompt for doing this.
  3. Thanks for letting me know the multiple definitions were confusing. I focused it on one simple one, particularly since training was split from the object.
  4. I added training as a separate function. I will go learn more about those third party trainers since I haven't used them.

Secondly, I would really like to use the nn.RNN if possible. There are very few tutorials that mention them and everyone seems to drive their RNN builds off this tutorial. However to solve this task, I think I need a network with layers like [57, 128, 18] and it looks like the default Elman networks are stuck at [57, 18] with layers.

Is best practice to inherit from nn.RNN and add my own fully connected output layer or am I misunderstanding something?

Thanks!

mgs28 commented 2 weeks ago

To make it simpler, I assume extending the nn.RNN class might look like (which runs about 40% faster)

class MyRNN(nn.RNN): def init(self,input_size, hidden_size, output_size): super(MyRNN, self).init(input_size, hidden_size)

    self.h2o = nn.Linear(hidden_size, output_size)
    self.softmax = nn.LogSoftmax(dim=1)

def forward(self, line_tensor):
    # Pass the input through the RNN layers
    rnn_out, hidden = super(MyRNN, self).forward(line_tensor)
    output = self.h2o(hidden[0])
    output = self.softmax(output)

    return output
jbschlosser commented 1 week ago

Is best practice to inherit from nn.RNN and add my own fully connected output layer or am I misunderstanding something?

Rather than inherit, we generally encourage composition. In this case, something like:

# better name needed :)
class MyRNN(nn.Module):
    def __init__(self, ...):
        super().__init__()
        self.rnn = nn.RNN(...)
        self.h2o = nn.Linear(...)

    ...
    def forward(self, x):
        _, hidden = self.rnn(x)
        output = self.h2o(hidden[0])
        return F.log_softmax(output, dim=1)
mgs28 commented 1 week ago

Thanks @jbschlosser ! I used nn.rnn in composition and changed some of the surrounding text. That forced the addition of a few terms to the repo dictionary. I appreciate you teaching me something again and hopefully the tutorial is better for it.

mgs28 commented 6 days ago

@jbschlosser - thanks!

  1. The training converges around 35-40 iterations. I am happy to train until convergence or leave some room for the reader. I also left it lighter so that build time is faster for the CI bot. What would you prefer?

  2. Comparisons with the original confusion matrix: It's train/test split + training time. If I train with

all_losses = train(rnn, alldata, n_epoch=100, learning_rate=0.1, report_every=5)

and evaluate on alldata then I get a bright diagonal line that looks pretty similar to original. I imagine with some parameter tuning then I could get closer.

jbschlosser commented 5 days ago

I also left it lighter so that build time is faster for the CI bot. What would you prefer?

It's a good point that we want to balance this some. That said, I think it'd be nice to be a little bit closer to the original (at least somewhat of a diagonal line confusion matrix). Hopefully we can strike a reasonable balance where we're beginning to see a diagonal line trend. No need to spend a ton of time parameter tuning though :) that's okay left as an exercise to the reader.

mgs28 commented 5 days ago

No problem @jbschlosser - I tuned some of the parameters to get pretty close to a diagonal confusion matrix. It still gets a little confused between English and Scottish as well as Korean and Chinese. However, there's a strong diagonal in the confusion matrix. Thanks!

jbschlosser commented 3 days ago

Thanks for the updates! I'm good with it; will let @svekars make the final approval :)