>=1.0 loss - Githubissues

badcode6 commented 1 year ago

I cloned this repository and took out all references to CUDA in every python file by hand. i removed the "... .cuda()", changed all devices from "gpu" to "cpu", or from model.to(gpu) to model.to(cpu), or "gpu if ... else cpu" to "cpu". now loss gets stuck at 1 no matter what I do. it may start at 0.98 something, but it always goes to 1 or 1.01 within 1 minute. i even waited for the entire first epoch to finish and nothing changed whatsoever. it took hours but i would assume it should be doing the same amount of work as the colab, just in a longer period of time?

i used the same code as i used in the colab (aside from gpu/cpu differences obviously) which trained quickly and easily on the free tier. the free tier hovered around 0.1-0.05 loss the entire time, especially towards the end of the epoch.

on my cpu, i tried raising the learning rate all the way up to 100, no difference, changing batch size, # workers. nothing made a difference.

more info about what i changed: removed all "... .cuda()", changed all map locations to "cpu", changed all torch.device("cuda") to torch.device("cpu"), changed all precision_scope("cuda") to precision_scope("cpu"). used --gpus 0 argument to main.py.

badcode6 commented 1 year ago

generating an image of the "placeholder_string" asterisk with the same seed shows no difference from an embedding.pt with 0 steps than one with 300 steps. as i said in another repo, i am not in a rush with the generation and training, i can wait overnight if need be, i just want a working result

badcode6 commented 1 year ago

according to @ . rinongal, the solution is to unlock the learning rate. the --scale-lr arg may help with that. I also started fresh without using the checkpoints from colab and that combination worked

nicolai256 / Stable-textual-inversion_win

>=1.0 loss #37