minimaxir / aitextgen

A robust Python tool for text-based AI training and generation using GPT-2.
https://docs.aitextgen.io
MIT License
1.84k stars 220 forks source link

Fine-tunoing rugpt3small_based_on_gpt2 #156

Open FelSpaceFel opened 3 years ago

FelSpaceFel commented 3 years ago

Hello once again, Max.

  1. I am trying to fine-tune this model: sberbank-ai/rugpt3small_based_on_gpt2 with aitextgen, and when I try to test it I get a compelete nonsence (random symbols like those: 0ая повной на ничегоала Энойак2 0ались нав0 еед кв2 0д0 заб>ай0алосьим0 на0им повд C ые спC головые0алось на онаП ее ко0 пов>ай% алосьC им0 на0им повд C ые спC головыеогоок ее —0имд В2 доложно еед В02 см забак Л0алисьд В0 кд время2 на0 слд время2 см заб0ались2 10ру0алисьв0в0в2 Пв0д время2 дол ), is it normal? Is that supposed to happen? I left it running in colab pro overnight and it got a bit better, but it still makes up unexisting words and so on. I feel like I'm doing something wrong here. It works even worse than my own model made from a scratch. But with my own model, I left it running in colab pro in a sum of for aproximately 24 hours and was able to achive it being able to write gramatically structured sentence, but they make little sense for a human. My loss is going down wery slowly after it reaches 1.890. I tried decreasing learning rate, and it worked, but then at some point it stoped working. I'm stuck at 1.890 loss with weird sentences. So my understanding is, it's better to fine-tune an existing model, rather than training my own from scratch, right? But as I said previously that worked even worse. And I can't load any other model from huggingface, they are either too big, or they show me an error.

  2. When I'm trying to train DeepPavlov/rubert-base-cased-sentence it gives me an error

ValueError: bos_token_id has to be defined when no input_ids are provided.

  1. When I'm trying to train blinoff/roberta-base-russian-v0 it gives me an error:
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1. 

I'm trying to train a model in russian, but so far no luck.

arqlz commented 2 years ago

Same with gpt-2-spanish

takeraparterer commented 10 months ago

Same problem