Open GenTxt opened 4 years ago
@GenTxt it should be fairly straightforward to implement! i'll get it done, and leave it to someone else with more resources to train and share a model
i'll be adding a bunch of features I learned from building other types of transformers to further enhance it as well
@GenTxt This is almost ready! Do you plan on training this on any text corpus? Perhaps pg19?
@GenTxt https://github.com/huggingface/nlp/pull/306 Once this is merged, it should be easy to start training
Hi Phil:
Thanks for the updates. Currently running the ewik8 train.py on my home machine and terminal output looks good.
Have a few questions:
e.g. training loss: 2.4765 | aux_loss: 0.9664 training: 0%| | 70/100000 [05:45<129:44:50, 4.67s/it]training loss: 2.4784 | aux_loss: 0.0000 training loss: 2.4343 | aux_loss: 0.0000
Does it save the model/weights at the end of 100000?
How to use a simple text file e.g. corpus.txt (1 sentence per line) instead of ewik8.gz ?
Can 'train.py' be modified as separate generation script for saved model above?
How to modify for using a multi-line input text file as start tokens?
prime = torch.ones(1, 1).cuda() # assume 1 is start token OR input.txt
Not a coder but I can make basic modifications to scripts. Would like to see pg19 model but don't have the $ resources to train.
Thanks
On Sat, Jul 4, 2020 at 2:26 PM Phil Wang notifications@github.com wrote:
@GenTxt https://github.com/GenTxt works great on enwik8 now, you should totally try this! are you a coder? or do you need this simplified even more?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lucidrains/compressive-transformer-pytorch/issues/1#issuecomment-653797356, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFMAWPPPZJFKI6SP4EHLUE3RZ5X5LANCNFSM4OKBSFNQ .
it's actually iterations, not epochs
nope, but that can be easily added!
that will take some work to setup, specifically you will have to write your own Dataset class. let me think about how to abstract this away though!
ahh yeah kind of, let me think about how to abstract this away as well. I'm thinking about something kind of like my stylegan2-pytorch repository, a commandline tool that lets you train, resume training, and generate easily
yea, that will take some coding
i'll setup training for PG19 soon-ish, and perhaps there will be some generous, curious person out there who will train it for us lol
I have some experience pre-training BERT style models on custom subsets of PG, and access to lots of academic GPU time. Not to mention generous, and curious :D
@lucidrains would you like to collaborate on pre-training a Compressive Transformer?
@lucidrains Hey Phil,
Great work with getting the implementation out in such a short amount of time.
I was trying to replicate the results of the paper and ran into a few issues.
I was trying to find how to calculate the final BPC score but could not find it as part of the current repository. Is that something you plan to add in the near future or open to a contribution from my side about the same?
There are also other smaller improvements which I believe can help make the repository better. Please let me know what you think about them
I did not find save model code anywhere - Adding that would be great
Making use of multiple GPUs for faster training - as of now I think only 1 GPU is being used and extending it to use multiple GPUs would help in making the training faster
Google finally released the two PG-19 models from the paper including code here:
https://github.com/google-research/google-research/tree/master/routing_transformer
https://storage.googleapis.com/rt-checkpoint/pg19_local.zip
https://storage.googleapis.com/rt-checkpoint/checkpoint.zip
Requires conversion to pytorch_model.bin and supporting files.
After reading deepmind blog post I was looking forward to downloading model but no luck. Looking forward to your implementation.
You may be aware of this post and link but if not this is the coder's original tf implementation. Hope it helps.
Copy of comment to original model request:
https://github.com/huggingface/transformers/issues/4688
Interested in model weights too but currently not available. Author does mention releasing tf code here:
https://news.ycombinator.com/item?id=22290227
Requires tf 1.15+ and deepmind/sonnet ver 1.36. Link to python script here:
https://github.com/deepmind/sonnet/blob/cd5b5fa48e15e4d020f744968f5209949ebe750f/sonnet/python/modules/nets/transformer.py#L915
Have tried running as-is but doesn't appear to have options for training on custom data as per the paper and available data sets.