mlpack / models

models built with mlpack
https://models.mlpack.org/docs
BSD 3-Clause "New" or "Revised" License
35 stars 41 forks source link

transformer #16

Open mrityunjay-tripathi opened 4 years ago

mrityunjay-tripathi commented 4 years ago

Hello everyone! I've implemented the transformer encoder and decoder. Though there are other dependencies for this PR, I made this PR to get some insights and opinions. Some things still remaining regarding this PR:

mlpack-bot[bot] commented 4 years ago

Thanks for opening your first pull request in this repository! Someone will review it when they have a chance. In the mean time, please be sure that you've handled the following things, to make the review process quicker and easier:

Thank you again for your contributions! :+1:

kartikdutt18 commented 4 years ago

Hey @mrityunjay-tripathi, Thanks for opening this PR. If possible could you create a models folder and place the transformer inside. As many models will be added to the repo it would be great if we had a models folder rather than having a lot of models in main folder. So it would be models/models/transformer.

mrityunjay-tripathi commented 4 years ago

Sure @kartikdutt18! Actually I was also thinking about why it was not that way. But no worries. I will make it that way now :+1:

kartikdutt18 commented 4 years ago

Sure @kartikdutt18! Actually I was also thinking about why it was not that way. But no worries. I will make it that way now 👍

Awesome, The reason for that it is, it's part of Restructuring - 3. All existing models will be replaced with something similar to what you have implemented i.e. in a class so that user could use pre trained models.

mlpack-bot[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! :+1:

mrityunjay-tripathi commented 4 years ago

@lozhnikov I've made the changes as you suggested. Can you please take a look? It's mostly done and once the required layers are merged we can test this.

lozhnikov commented 4 years ago

Sure, I'll review the PR today (in the evening).

mrityunjay-tripathi commented 3 years ago

I was trying to test this locally and I got following error--

error: matrix multiplication: incompatible matrix dimensions: 0x0 and 16x10
unknown location(0): fatal error: in "FFNModelsTests/TransformerEncoderTest": std::logic_error: matrix multiplication: incompatible matrix dimensions: 0x0 and 16x10

I feel there is some problem with Reset. The weights and biases are not allocated memory. But I can't find why.

lozhnikov commented 3 years ago

I feel there is some problem with Reset. The weights and biases are not allocated memory. But I can't find why.

I'll look into it in the morning.

Upd: I'd use GDB in order to find the actual place where it happens.

lozhnikov commented 3 years ago

@mrityunjay-tripathi I think I get it. Looks like my comment https://github.com/mlpack/models/pull/16#discussion_r475212736 was wrong. We need to pass model = true to the constructors (I mean Sequential, Concat, AddMerge). In this case all the visitors will go through their sublayers. And there will be no memory issues since the Delete visitor does the same.

mrityunjay-tripathi commented 3 years ago

We need to pass model = true to the constructors (I mean Sequential, Concat, AddMerge).

Yeah. Got it. Thanks :)

lozhnikov commented 3 years ago

@mrityunjay-tripathi Is this PR ready? Can I review this?

mrityunjay-tripathi commented 3 years ago

Yes. This is ready for review.

mlpack-bot[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! :+1: