Issue with to_fp16() - Githubissues

Manas-Embold commented 3 years ago

Hi Max,

I trained 344M model using gpt2 simple (dataset was java code for auto code completion) and saved the checkpoint. Converted the model to pytorch using:

! cd '/content/checkpoint' && transformers-cli convert --model_type gpt2 --tf_checkpoint '/content/checkpoint/run1/' --pytorch_dump_output '/content/checkpoint/run1/pytorch' --config '/content/checkpoint/run1/hparams.json'

When i load the model normally

from aitextgen import aitextgen config = '/content/checkpoint/run1/pytorch/config.json' ai = aitextgen(model="/content/checkpoint/run1/pytorch/pytorch_model.bin", config=config)

No issues and i can generate easily:

ai.generate(n=1, prompt="system.out.", max_length=100)

OUTPUT: system.out.println( + id);

However since, I want to convert this to fp16 for fast inferencing I converted model to fp 16 as follows

from aitextgen import aitextgen config = '/content/checkpoint/run1/pytorch/config.json' ai = aitextgen(model="/content/checkpoint/run1/pytorch/pytorch_model.bin", config=config, to_gpu=True, to_fp16=True)

When i call generate now, it outputs english instead of java

ai.generate(n=1, prompt="system.out.", max_length=100)

OUTPUT: system.out. loc character decidedally healthy ultimately points belie mass nearly regidedot price clicklike make TodayocaInd unlike journal Norretene links Good void et attackalsAnSD 54giving sing high Assassatelyhus Y humansware concerned connectionsSt� was believesligmartacing Geteworkamedann·aultrict dep2013� daughtermentructure couldentiallyrolloth confrontted Archbi suitiffge beaut Ed industward Sony* thereileOMrugateg rented Birminghamvironment underinceeg Windows intense static

Manas-Embold commented 3 years ago

Any thoughts, where am i going wrong in conversion ? I think after conversion its kind of loading default gpt2 english language model instead of mine gpt-2 model trained on java code.

Manas-Embold commented 3 years ago

When i use to_gpu=True and to_fp16=True for loading, i get english as output When i just use to_fp16=True and skip to_gpu=True, i get proper java output

This looks strange.

minimaxir commented 3 years ago

to_fp16() is sorta beta and not fully tested. Ideally the ONNX support which I intend to add will handle this better.

However, that output is just weird in that it's pseudorandom as opposed to fully random, which may imply a different issue in the pipeline.

junkgear commented 3 years ago

Alright, thanks for reviewing !

minimaxir commented 3 years ago

Tested: yes it's random output. I assume something changed in Transformers upstream, so I might have to remove it (also there doesn't seem to be a speed increase anymore). Will add a warning for now.

briansemrau commented 3 years ago

I'm able to use fp16 with sensible outputs if I use:

with torch.cuda.amp.autocast():
    ai.generate(...)

Interestingly, I seem to be getting slower generation using fp16 on an RTX2060. Though, half memory usage is a plus.

jonnyplatt commented 3 years ago

I was really puzzled by this: I found to_fp16 was generating sensible, normal content on Google Colab despite the warning messages, but was totally bizarre in production. It turned out the pyTorch versions were different - Google was on torch 1.8.1 and Cuda 11.1, while my server was torch 1.7 and Cuda 11.0 . Once I upgraded the libraries on my server I found FP16 generation was working correctly again, so it may be worth updating the warning where people are on older pyTorch versions?

minimaxir / aitextgen

Issue with to_fp16() #70