Just my training result now

northeastsquare commented 5 years ago

Training speed very slow, I have only one gpu titanx, have trained for three days. Train Epoch: 186 [128/960 (13%)] Loss: 1.369987 Detection Loss: 0.646443 Recognition Loss:0.723544 Train Epoch: 186 [192/960 (20%)] Loss: 1.352539 Detection Loss: 0.597384 Recognition Loss:0.755155 Train Epoch: 186 [256/960 (27%)] Loss: 1.433395 Detection Loss: 0.712065 Recognition Loss:0.721329 Train Epoch: 186 [320/960 (33%)] Loss: 1.274836 Detection Loss: 0.535909 Recognition Loss:0.738927 Train Epoch: 186 [384/960 (40%)] Loss: 1.330316 Detection Loss: 0.584632 Recognition Loss:0.745684 Train Epoch: 186 [448/960 (47%)] Loss: 1.290798 Detection Loss: 0.588888 Recognition Loss:0.701910 Train Epoch: 186 [512/960 (53%)] Loss: 1.326055 Detection Loss: 0.575449 Recognition Loss:0.750607 Train Epoch: 186 [576/960 (60%)] Loss: 1.273356 Detection Loss: 0.545374 Recognition Loss:0.727982 Train Epoch: 186 [640/960 (67%)] Loss: 1.288726 Detection Loss: 0.571910 Recognition Loss:0.716816 Train Epoch: 186 [704/960 (73%)] Loss: 1.302910 Detection Loss: 0.557011 Recognition Loss:0.745899 Train Epoch: 186 [768/960 (80%)] Loss: 1.265796 Detection Loss: 0.549302 Recognition Loss:0.716494 Train Epoch: 186 [832/960 (87%)] Loss: 1.227651 Detection Loss: 0.524842 Recognition Loss:0.702808 Train Epoch: 186 [896/960 (93%)] Loss: 1.422848 Detection Loss: 0.615706 Recognition Loss:0.807142 epoch : 186 loss : 1.3162313858668009 precious : 0.0005868187579214195 recall : 0.0005868187579214195 hmean : 0.0005868187579214195 val_precious : 0.0011737089201877935 val_recall : 0.003937007874015748 val_hmean : 0.001808318264014467

MiZhangWhuer commented 5 years ago

Training speed very slow, I have only one gpu titanx, have trained for three days. Train Epoch: 186 [128/960 (13%)] Loss: 1.369987 Detection Loss: 0.646443 Recognition Loss:0.723544 Train Epoch: 186 [192/960 (20%)] Loss: 1.352539 Detection Loss: 0.597384 Recognition Loss:0.755155 Train Epoch: 186 [256/960 (27%)] Loss: 1.433395 Detection Loss: 0.712065 Recognition Loss:0.721329 Train Epoch: 186 [320/960 (33%)] Loss: 1.274836 Detection Loss: 0.535909 Recognition Loss:0.738927 Train Epoch: 186 [384/960 (40%)] Loss: 1.330316 Detection Loss: 0.584632 Recognition Loss:0.745684 Train Epoch: 186 [448/960 (47%)] Loss: 1.290798 Detection Loss: 0.588888 Recognition Loss:0.701910 Train Epoch: 186 [512/960 (53%)] Loss: 1.326055 Detection Loss: 0.575449 Recognition Loss:0.750607 Train Epoch: 186 [576/960 (60%)] Loss: 1.273356 Detection Loss: 0.545374 Recognition Loss:0.727982 Train Epoch: 186 [640/960 (67%)] Loss: 1.288726 Detection Loss: 0.571910 Recognition Loss:0.716816 Train Epoch: 186 [704/960 (73%)] Loss: 1.302910 Detection Loss: 0.557011 Recognition Loss:0.745899 Train Epoch: 186 [768/960 (80%)] Loss: 1.265796 Detection Loss: 0.549302 Recognition Loss:0.716494 Train Epoch: 186 [832/960 (87%)] Loss: 1.227651 Detection Loss: 0.524842 Recognition Loss:0.702808 Train Epoch: 186 [896/960 (93%)] Loss: 1.422848 Detection Loss: 0.615706 Recognition Loss:0.807142 epoch : 186 loss : 1.3162313858668009 precious : 0.0005868187579214195 recall : 0.0005868187579214195 hmean : 0.0005868187579214195 val_precious : 0.0011737089201877935 val_recall : 0.003937007874015748 val_hmean : 0.001808318264014467

Hi, what is your training dataset? icdar2015? I have trained model on icdar2015 dataset, but the recognition loss seems little going down. Here is my log:

Train Epoch: 1382 [0/924 (0%)] Loss: 4.932896 Detection Loss: 0.005459 Recognition Loss:4.927436 Train Epoch: 1382 [140/924 (15%)] Loss: 4.085939 Detection Loss: 0.005984 Recognition Loss:4.079955 Train Epoch: 1382 [280/924 (30%)] Loss: 4.303678 Detection Loss: 0.005653 Recognition Loss:4.298025 Train Epoch: 1382 [420/924 (45%)] Loss: 4.824463 Detection Loss: 0.007442 Recognition Loss:4.817021 Cross point does not exist Cross point does not exist Train Epoch: 1382 [560/924 (61%)] Loss: 4.520021 Detection Loss: 0.007846 Recognition Loss:4.512175 Train Epoch: 1382 [700/924 (76%)] Loss: 4.456864 Detection Loss: 0.005862 Recognition Loss:4.451002 Train Epoch: 1382 [840/924 (91%)] Loss: 4.384556 Detection Loss: 0.009471 Recognition Loss:4.375085 epoch : 1382 loss : 4.395727005871859 precious : 0.0 recall : 0.0 hmean : 0.0 val_precious : 0.0 val_recall : 0.0 val_hmean : 0.0

and corresponding config json file:

{ "name": "FOTS", "cuda": true, "gpus": [0, 1], "data_loader": { "dataset":"icdar2015", "data_dir": "home/ICDAR/2015/train", "batch_size": 28, "shuffle": true, "workers": 4 }, "validation": { "validation_split": 0.1, "shuffle": true }, "lr_scheduler_type": "ExponentialLR", "lr_scheduler_freq": 10000, "lr_scheduler": { "gamma": 0.94 }, "optimizer_type": "Adam", "optimizer": { "lr": 0.001, "weight_decay": 1e-5 }, "loss": "FOTSLoss", "metrics": ["fots_metric"], "trainer": { "epochs": 100000, "save_dir": "saved/", "save_freq": 10, "verbosity": 2, "monitor": "hmean", "monitor_mode": "max" }, "arch": "FOTSModel", "model": { "mode": "united" } }

What's wrong with my scripts? I have tested on detection part, it runs well. But I am not sure if I'm right on recognition part. Could you kindly provide your training scripts and procedure?

jiangxiluning commented 5 years ago

@northeastsquare @MiZhangWhuer hi, two guys, I understand your concern. I am still verifying the recognition branch's correctness, If I have any news, I will tell you. I have same problem with yours.

xxlxx1 commented 5 years ago

@northeastsquare @MiZhangWhuer @jiangxiluning I have some question about recognition loss, when validation, the model calculate the recognition loss use the predict boxes (in code), then the predict boxes cant't map the gt, ctcloss error appear. I use icdar 2015 data to train, error appears when validation.

northeastsquare commented 5 years ago

@MiZhangWhuer I train on icdar2015. @xxlxx1 Can you print you error message?

xxlxx1 commented 5 years ago

@northeastsquare “RuntimeError ： targetlengths mus be of size batch size”

isyanan1024 commented 5 years ago

@northeastsquare hi! I am training this model.when i use icdar2015,the log like this epoch : 3 loss : 4.079219924869822 precious : 0.0 recall : 0.0 hmean : 0.0 val_precious : 0.0 val_recall : 0.0 val_hmean : 0.0 precious and recall always 0. can you tell me why?

northeastsquare commented 5 years ago

somewhere should be changed, but I forgot it.

13438960761 commented 5 years ago

@isyanan1024 the result i got is same with you, the precious and recall always 0, and the Recognition Loss always about 4. Did you have solve it? epoch : 10 loss : 4.176406042575836 precious : 0.0 recall : 0.0 hmean : 0.0 val_precious : 0.0 val_recall : 0.0 val_hmean : 0.0 @northeastsquare could you please tell me somewhere you changed? thanks

foocker commented 5 years ago

same problem for the latest code..., precious :0 ,recall : 0, hmean :0

foocker commented 5 years ago

same problem for the latest code..., precious :0 ,recall : 0, hmean :0

it seems the dataloader bug? when show in a batch, some image's label is not alignment...

jiangxiluning / FOTS.PyTorch

Just my training result now #16