mloncode / workshop

Machine Learning for Software Engineering: modelling the source code workshop.
4 stars 1 forks source link

New name suggestion notebook using tensor2tensor #8

Open bzz opened 4 years ago

bzz commented 4 years ago

Previous name suggestion notebook using OpenNMT-tf + youtokenme for BPE was overexposing the accidental complexity of "tenzorisation" of the source code.

This uses https://github.com/tensorflow/tensor2tensor as a library to archive the same, and thus also works on TPU with Colab.

This version is not the final, workshop-ready one but rather an intermediate one that was used on Colab.

We agreed that the scarce time of workshop prep would better be spent not improving this one, but rather including a pyTourch version instead, where it should be easy to incorporate custom models e.g based on GGNN, and contrast the results to seq2seq ones.

@m09 Please, review the structure of the notebook though - I'm planing to reuse it for a new version, so any methodological feedback on it would be very appreciated.

review-notebook-app[bot] commented 4 years ago

Check out this pull request on  ReviewNB

You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB.

bzz commented 4 years ago

I figured that reviewing ToC in a notebook diff is not easy, so here it is, the collapsable cell structure with titles:

Screen Shot 2019-12-16 at 3 13 52 PM

And here it is customized for a new workshop, and what we discussed already:

Data: exploration
Data: problem definition for CodeSearchNet dataset
Data: generate tensor representation
Data: inspect
  Plot subtoken sequence lengths
Train the model
Visualize the training
Inference using trained model
Visualize the attention
  Attention Utils
  Display Attention
Serving: export the model
Serving: serve predictions over HTTP
Interactive predictions (local webapp)
Compare the results with the literature
Change the model to GGNN (advanced)
bzz commented 4 years ago

Experiment results

fn-name-suggestion

bzz commented 4 years ago

Shall we merge this?

m09 commented 4 years ago

Sorry @bzz for the lack of review, I had in mind that it was WIP for some reason even though you asked for review super long time ago, my bad. I think we can merge as is: it seems to need a rebase against the new docker image but so do the other notebooks so we can take care of that in a future PR. As for the experimental plan, we can also discuss it together with the PyTorch version and come to a parallel plan that will be great for both, using this one as the starting point (it seems very good to me).