victorchall / EveryDream2trainer

Other
796 stars 118 forks source link

Text Encoder Not Found [BUG] #264

Closed qudabear closed 1 month ago

qudabear commented 1 month ago

Getting this error, every train:

text_encoder/model.safetensors not found

Does this mean that it is not training the text encoder? If so, this would explain why all the results are terrible and under-trained...

Also please consider adding better ways to contact you if you are not going to respond on Github. I'm not going to install Discord...

qudabear commented 1 month ago

Also having "Validation loss for val shows diverging. Check your loss/val graph." From the very beginning of training, with any combination of settings, and even with manual validation images. Would be cool if you could check your github and respond...

victorchall commented 1 month ago

I believe the "text_encoder not found" is simply because you're loading from a model that is .bin instead of .safetensors. If it was a real problem the entire trainer would crash. If it doesn't just crash the program entirely it has nothing to do with your results being bad.

"Validation loss ... diverging" is simply a warning that you may be overtraining, but is not definitive. You should review your logs using tensorboard. All this means is first derivative of validation loss, over a small moving window, is positive. Validation loss generally should trend downward, but with very small datasets it may trend upward before you see your training data trained enough for results.

You may need to tweak your data, your learning rate, your batch size, etc. to get better results. The best place to ask is on discord, sorry you dislike discord but it's how pretty much everyone communicates now. You also will get your problem in front of more eyes. It's not just me, but everyone else on that discord, because no one else but myself will ever reply to you here.

qudabear commented 1 month ago

Incorrect. I was using a Safetensors model. The error shows up every time. Also, in response to your other comments where you completely dismissed everything I typed here, I already DID tell you the problems that were happening. In the meantime, I figured out what was wrong by myself: Your batch size is way too high to be using a low learning rate like 1e-6, and you have to train with higher learning rates, the weights are wrong, and the instructions are wrong. This helps a bit with the loss / val, but all the results are still terrible and something is clearly still wrong with the trainer, as it cannot produce even passable results. My data is fine. I'm using the same dataset I've trained hundreds of times on other repos, and it worked just fine. And no, Discord is not how "everyone" communicates. It's one out of a billion options for instant messaging, and I hate it and won't use it. If you're going to post a repo, be available for questions on the site you posted the repo. Common sense. I've moved on to better options. Your complete dismissal of the problems, incredibly long time to respond, refusal to use github, and your attitude have put me off from Everdream.