takase / share_layer_params

MIT License
28 stars 4 forks source link

Trained model available? #8

Closed Clemens123 closed 2 years ago

Clemens123 commented 2 years ago

Hi! Firstly, thanks for publishing your research implementation! :)

Since I don't have the computing power required for the training, I was wondering whether you are willing to share the trained model or whether someone else has done so? I would love to do some experiments fintuning the model on a custom dataset :)

takase commented 2 years ago

Hi, thank you for your interest.

Now, I don't have any plan to upload a trained model due to the limitation of Git repositories. However, I consider the way to upload a trained model as an example such as the 18 layered encoder-decoder with cycle reverse for WMT English-German, if you want. Can the above model satisfy your requirement?

Clemens123 commented 2 years ago

Yes, that would be highly appreciated :) Maybe you could upload it to some Cloud-Storage (Google Drive, OneDrive, ...) and add the link here? Alternatively maybe GitHub Releases could be a solution? The documentation says that files up to 2GB can be added: https://docs.github.com/en/repositories/releasing-projects-on-github/managing-releases-in-a-repository

Thanks for your effort!

takase commented 2 years ago

I checked whether our trained model can be used in a place which different from the one during the training. Unfortunately, we can't.

To fix the above problem, we have to modify our codes but I don't have enough time to do. If you understand this situation, I can give you the trained model privately.

In addition, I appreciate if you fix our codes.

Clemens123 commented 2 years ago

Thanks for the offer! Right now I'm too busy to dive deeper into your code/model, but I'll come back to you when I have a bit time to spare!

EliasVansteenkiste commented 1 year ago

Hi, thank you for your interest.

Now, I don't have any plan to upload a trained model due to the limitation of Git repositories. However, I consider the way to upload a trained model as an example such as the 18 layered encoder-decoder with cycle reverse for WMT English-German, if you want. Can the above model satisfy your requirement?

I can take a look to fix your code to be able to use your weights. Could you share them? Thank you in advance, Elias

takase commented 1 year ago

Hi, Elias

Thank you for your proposal. Our trained model, which is 18 layered WMT En-De, shared layers with CycleReverse, is uploaded to here: https://drive.google.com/file/d/1ZDC7NJ7IFmOygjiLgNLxCoeZ5x7SES1x/view?usp=drive_link

To use this model, we have to load profile in addition to checkpoint because we used the Transformer with Admin; https://github.com/LiyuanLucasLiu/Transformer-Clinic.