lucidrains / compressive-transformer-pytorch

Pytorch implementation of Compressive Transformers, from Deepmind
MIT License
155 stars 20 forks source link

Links to original tf code - fyi #1

Open GenTxt opened 4 years ago

GenTxt commented 4 years ago

After reading deepmind blog post I was looking forward to downloading model but no luck. Looking forward to your implementation.

You may be aware of this post and link but if not this is the coder's original tf implementation. Hope it helps.

Copy of comment to original model request:

https://github.com/huggingface/transformers/issues/4688

Interested in model weights too but currently not available. Author does mention releasing tf code here:

https://news.ycombinator.com/item?id=22290227

Requires tf 1.15+ and deepmind/sonnet ver 1.36. Link to python script here:

https://github.com/deepmind/sonnet/blob/cd5b5fa48e15e4d020f744968f5209949ebe750f/sonnet/python/modules/nets/transformer.py#L915

Have tried running as-is but doesn't appear to have options for training on custom data as per the paper and available data sets.

lucidrains commented 4 years ago

@GenTxt it should be fairly straightforward to implement! i'll get it done, and leave it to someone else with more resources to train and share a model

i'll be adding a bunch of features I learned from building other types of transformers to further enhance it as well

lucidrains commented 4 years ago

@GenTxt This is almost ready! Do you plan on training this on any text corpus? Perhaps pg19?

lucidrains commented 4 years ago

@GenTxt https://github.com/huggingface/nlp/pull/306 Once this is merged, it should be easy to start training

GenTxt commented 4 years ago

Hi Phil:

Thanks for the updates. Currently running the ewik8 train.py on my home machine and terminal output looks good.

Have a few questions:

e.g. training loss: 2.4765 | aux_loss: 0.9664 training: 0%| | 70/100000 [05:45<129:44:50, 4.67s/it]training loss: 2.4784 | aux_loss: 0.0000 training loss: 2.4343 | aux_loss: 0.0000

prime = torch.ones(1, 1).cuda() # assume 1 is start token OR input.txt

Not a coder but I can make basic modifications to scripts. Would like to see pg19 model but don't have the $ resources to train.

Thanks

On Sat, Jul 4, 2020 at 2:26 PM Phil Wang notifications@github.com wrote:

@GenTxt https://github.com/GenTxt works great on enwik8 now, you should totally try this! are you a coder? or do you need this simplified even more?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lucidrains/compressive-transformer-pytorch/issues/1#issuecomment-653797356, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFMAWPPPZJFKI6SP4EHLUE3RZ5X5LANCNFSM4OKBSFNQ .

lucidrains commented 4 years ago

it's actually iterations, not epochs

nope, but that can be easily added!

that will take some work to setup, specifically you will have to write your own Dataset class. let me think about how to abstract this away though!

ahh yeah kind of, let me think about how to abstract this away as well. I'm thinking about something kind of like my stylegan2-pytorch repository, a commandline tool that lets you train, resume training, and generate easily

yea, that will take some coding

i'll setup training for PG19 soon-ish, and perhaps there will be some generous, curious person out there who will train it for us lol

DarrenAbramson commented 4 years ago

I have some experience pre-training BERT style models on custom subsets of PG, and access to lots of academic GPU time. Not to mention generous, and curious :D

@lucidrains would you like to collaborate on pre-training a Compressive Transformer?

RajeshDM commented 4 years ago

@lucidrains Hey Phil,

Great work with getting the implementation out in such a short amount of time.

I was trying to replicate the results of the paper and ran into a few issues.

I was trying to find how to calculate the final BPC score but could not find it as part of the current repository. Is that something you plan to add in the near future or open to a contribution from my side about the same?

There are also other smaller improvements which I believe can help make the repository better. Please let me know what you think about them

  1. I did not find save model code anywhere - Adding that would be great

  2. Making use of multiple GPUs for faster training - as of now I think only 1 GPU is being used and extending it to use multiple GPUs would help in making the training faster

GenTxt commented 3 years ago

Google finally released the two PG-19 models from the paper including code here:

https://github.com/google-research/google-research/tree/master/routing_transformer

https://storage.googleapis.com/rt-checkpoint/pg19_local.zip

https://storage.googleapis.com/rt-checkpoint/checkpoint.zip

Requires conversion to pytorch_model.bin and supporting files.