validation loss doesn't decrease

www322 commented 4 years ago

hello, i fork the project, and train with default config. But the best mean validation loss never decreases, here is some of log. can you help me, thanks.

iter: 100 mean_loss: 1.8309582471847534 best_mean_loss: inf iter: 500 mean_loss: 2.4141554832458496 best_mean_loss: 1.8309582471847534 iter: 1000 mean_loss: 2.237107038497925 best_mean_loss: 1.8309582471847534 iter: 1500 mean_loss: 2.3634984493255615 best_mean_loss: 1.8309582471847534 iter: 2000 mean_loss: 2.4474143981933594 best_mean_loss: 1.8309582471847534 iter: 2500 mean_loss: 2.3407340049743652 best_mean_loss: 1.8309582471847534 iter: 3000 mean_loss: 2.453896999359131 best_mean_loss: 1.8309582471847534 iter: 3500 mean_loss: 2.4789130687713623 best_mean_loss: 1.8309582471847534 iter: 4000 mean_loss: 2.4712650775909424 best_mean_loss: 1.8309582471847534 iter: 4500 mean_loss: 2.2194013595581055 best_mean_loss: 1.8309582471847534 iter: 5000 mean_loss: 2.4809963703155518 best_mean_loss: 1.8309582471847534 iter: 5500 mean_loss: 2.19692325592041 best_mean_loss: 1.8309582471847534 iter: 6000 mean_loss: 2.5656793117523193 best_mean_loss: 1.8309582471847534 iter: 6500 mean_loss: 2.2574918270111084 best_mean_loss: 1.8309582471847534 iter: 7000 mean_loss: 2.2906219959259033 best_mean_loss: 1.8309582471847534 iter: 7500 mean_loss: 2.2386887073516846 best_mean_loss: 1.8309582471847534 iter: 8000 mean_loss: 2.2122697830200195 best_mean_loss: 1.8309582471847534 iter: 8500 mean_loss: 2.29569149017334 best_mean_loss: 1.8309582471847534 iter: 9000 mean_loss: 2.2612431049346924 best_mean_loss: 1.8309582471847534 iter: 9500 mean_loss: 2.3730521202087402 best_mean_loss: 1.8309582471847534 iter: 10000 mean_loss: 2.150024175643921 best_mean_loss: 1.8309582471847534 iter: 10500 mean_loss: 2.254610776901245 best_mean_loss: 1.8309582471847534 iter: 11000 mean_loss: 2.33597993850708 best_mean_loss: 1.8309582471847534 iter: 11500 mean_loss: 2.236077308654785 best_mean_loss: 1.8309582471847534 iter: 12000 mean_loss: 2.3044650554656982 best_mean_loss: 1.8309582471847534 iter: 12500 mean_loss: 2.1969802379608154 best_mean_loss: 1.8309582471847534 iter: 13000 mean_loss: 2.142340898513794 best_mean_loss: 1.8309582471847534 iter: 13500 mean_loss: 2.2406578063964844 best_mean_loss: 1.8309582471847534 iter: 14000 mean_loss: 2.3895926475524902 best_mean_loss: 1.8309582471847534 iter: 14500 mean_loss: 2.2613422870635986 best_mean_loss: 1.8309582471847534 iter: 15000 mean_loss: 2.274299144744873 best_mean_loss: 1.8309582471847534 iter: 15500 mean_loss: 2.1895949840545654 best_mean_loss: 1.8309582471847534 iter: 16000 mean_loss: 2.2288248538970947 best_mean_loss: 1.8309582471847534 iter: 16500 mean_loss: 2.1653265953063965 best_mean_loss: 1.8309582471847534 iter: 17000 mean_loss: 2.1874473094940186 best_mean_loss: 1.8309582471847534 iter: 17500 mean_loss: 2.2273499965667725 best_mean_loss: 1.8309582471847534 iter: 18000 mean_loss: 2.3271074295043945 best_mean_loss: 1.8309582471847534 iter: 18500 mean_loss: 2.0903775691986084 best_mean_loss: 1.8309582471847534 iter: 19000 mean_loss: 2.2129886150360107 best_mean_loss: 1.8309582471847534 iter: 19500 mean_loss: 2.1324164867401123 best_mean_loss: 1.8309582471847534 iter: 20000 mean_loss: 2.2728190422058105 best_mean_loss: 1.8309582471847534 iter: 20500 mean_loss: 2.195408821105957 best_mean_loss: 1.8309582471847534 iter: 21000 mean_loss: 2.1608526706695557 best_mean_loss: 1.8309582471847534 iter: 21500 mean_loss: 2.236633062362671 best_mean_loss: 1.8309582471847534 iter: 22000 mean_loss: 2.0883257389068604 best_mean_loss: 1.8309582471847534 iter: 22500 mean_loss: 2.3318989276885986 best_mean_loss: 1.8309582471847534 iter: 23000 mean_loss: 2.1290135383605957 best_mean_loss: 1.8309582471847534 iter: 23500 mean_loss: 2.2543423175811768 best_mean_loss: 1.8309582471847534 iter: 24000 mean_loss: 2.22518253326416 best_mean_loss: 1.8309582471847534 iter: 24500 mean_loss: 2.2507972717285156 best_mean_loss: 1.8309582471847534 iter: 25000 mean_loss: 2.11651349067688 best_mean_loss: 1.8309582471847534 iter: 25500 mean_loss: 2.064210891723633 best_mean_loss: 1.8309582471847534 iter: 26000 mean_loss: 2.0486176013946533 best_mean_loss: 1.8309582471847534 iter: 26500 mean_loss: 2.1288259029388428 best_mean_loss: 1.8309582471847534 iter: 27000 mean_loss: 2.1848762035369873 best_mean_loss: 1.8309582471847534 iter: 27500 mean_loss: 2.0812535285949707 best_mean_loss: 1.8309582471847534 iter: 28000 mean_loss: 2.0336077213287354 best_mean_loss: 1.8309582471847534 iter: 28500 mean_loss: 2.158541202545166 best_mean_loss: 1.8309582471847534 iter: 29000 mean_loss: 2.1059839725494385 best_mean_loss: 1.8309582471847534 iter: 29500 mean_loss: 2.034742832183838 best_mean_loss: 1.8309582471847534 iter: 30000 mean_loss: 2.0858912467956543 best_mean_loss: 1.8309582471847534 iter: 30500 mean_loss: 2.1409056186676025 best_mean_loss: 1.8309582471847534 iter: 31000 mean_loss: 1.9738627672195435 best_mean_loss: 1.8309582471847534 iter: 31500 mean_loss: 2.0811076164245605 best_mean_loss: 1.8309582471847534 iter: 32000 mean_loss: 2.0922210216522217 best_mean_loss: 1.8309582471847534 iter: 32500 mean_loss: 2.0118913650512695 best_mean_loss: 1.8309582471847534

zhou13 commented 4 years ago

Your output format is not the same as the one in this repo. Please follow the instructions strictly for training. I will not provide support on modified code/dataset.

www322 commented 4 years ago

Your output format is not the same as the one in this repo. Please follow the instructions strictly for training. I will not provide support on modified code/dataset.

i don't change trainging detail, just add some log. Could you please put your training log file out?

zhou13 commented 4 years ago

loss.csv:

progress,sum,jmap,lmap,joff,lpos,lneg
000/0000060,1.75279771343,0.37524629916,0.15470244552,0.12701180219,0.75831757505,0.33751959150
001/0024000,0.50378183573,0.15652761937,0.09540034115,0.09491426769,0.06605104742,0.09088856011
002/0048000,0.48701415729,0.14914281405,0.09175185366,0.09355604620,0.08754904481,0.06501439857
003/0072000,0.46902908961,0.14669119876,0.09059983792,0.09268161877,0.05220702141,0.08684941275
004/0096000,0.46071388442,0.14550249685,0.08986504400,0.09226031553,0.06592878177,0.06715724627
005/0120000,0.46986991119,0.14534756067,0.08914340399,0.09207765894,0.08116334077,0.06213794683
007/0144000,0.45377785175,0.14283076345,0.08831412205,0.09172482392,0.06068950190,0.07021864044
008/0168000,0.44746913773,0.14293300902,0.08770429096,0.09199765702,0.05215444181,0.07267973891
009/0192000,0.46021545543,0.14151686411,0.08739450284,0.09182165350,0.08795528864,0.05152714635
010/0216000,0.43577223871,0.13792788179,0.08523807360,0.09033579795,0.06843417757,0.05383630779
011/0240000,0.43855728625,0.13748315102,0.08491522790,0.09016179400,0.07630488436,0.04969222897
013/0264000,0.44450381761,0.13735615820,0.08473246158,0.09013887234,0.08365259029,0.04862373521
014/0288000,0.44985358860,0.13750173790,0.08472613352,0.09014078801,0.08967961678,0.04780531239
015/0312000,0.44562743640,0.13768128731,0.08474339951,0.09003551778,0.08312178154,0.05004545025

www322 commented 4 years ago

loss.csv:

progress,sum,jmap,lmap,joff,lpos,lneg
000/0000060,1.75279771343,0.37524629916,0.15470244552,0.12701180219,0.75831757505,0.33751959150
001/0024000,0.50378183573,0.15652761937,0.09540034115,0.09491426769,0.06605104742,0.09088856011
002/0048000,0.48701415729,0.14914281405,0.09175185366,0.09355604620,0.08754904481,0.06501439857
003/0072000,0.46902908961,0.14669119876,0.09059983792,0.09268161877,0.05220702141,0.08684941275
004/0096000,0.46071388442,0.14550249685,0.08986504400,0.09226031553,0.06592878177,0.06715724627
005/0120000,0.46986991119,0.14534756067,0.08914340399,0.09207765894,0.08116334077,0.06213794683
007/0144000,0.45377785175,0.14283076345,0.08831412205,0.09172482392,0.06068950190,0.07021864044
008/0168000,0.44746913773,0.14293300902,0.08770429096,0.09199765702,0.05215444181,0.07267973891
009/0192000,0.46021545543,0.14151686411,0.08739450284,0.09182165350,0.08795528864,0.05152714635
010/0216000,0.43577223871,0.13792788179,0.08523807360,0.09033579795,0.06843417757,0.05383630779
011/0240000,0.43855728625,0.13748315102,0.08491522790,0.09016179400,0.07630488436,0.04969222897
013/0264000,0.44450381761,0.13735615820,0.08473246158,0.09013887234,0.08365259029,0.04862373521
014/0288000,0.44985358860,0.13750173790,0.08472613352,0.09014078801,0.08967961678,0.04780531239
015/0312000,0.44562743640,0.13768128731,0.08474339951,0.09003551778,0.08312178154,0.05004545025

my loss just decrease a little.... progress	sum	jmap	lmap	joff	lpos	lneg
000/0000600	1.248827176	0.298838505	0.157183459	0.124452122	0.419472682	0.248880408
000/0003000	1.813851228	0.323483961	0.159492899	0.124171394	0.978839562	0.227863412
000/0006000	1.645010522	0.305443836	0.159719087	0.12403664	0.587222685	0.468588274
000/0009000	1.796099539	0.286641861	0.154425241	0.123920985	1.032610955	0.198500497
000/0012000	1.88439716	0.284493703	0.154312603	0.123831982	1.211997126	0.109761747
000/0015000	1.778613643	0.283922542	0.153639499	0.123933744	1.089029502	0.128088355
000/0018000	1.892158679	0.283370303	0.153520288	0.12382096	1.243427934	0.088019194
001/0021000	1.916045297	0.283504294	0.153514942	0.123830775	1.267820498	0.087374788
001/0024000	1.908714614	0.283485465	0.153374455	0.123810106	1.26362534	0.084419247
001/0027000	1.657102549	0.283088738	0.153191832	0.123853239	0.958800552	0.138168188
001/0030000	1.919623602	0.282796885	0.153767555	0.123899418	1.276822152	0.082337591
001/0033000	1.636602454	0.282083803	0.153060068	0.12386962	0.992779487	0.084809476
001/0036000	2.00500386	0.282788708	0.152727455	0.123839457	1.40698841	0.03865983
001/0039000	1.696137916	0.283059996	0.153473942	0.12399845	1.061691116	0.073914412
002/0042000	1.729389599	0.282306849	0.152982708	0.123805755	1.082474458	0.087819829
002/0045000	1.678012056	0.282357523	0.152578431	0.123822259	1.010002412	0.10925143
002/0048000	1.651241664	0.28206377	0.152874476	0.123848495	0.979537279	0.112917644
002/0051000	1.735505294	0.281910611	0.152406258	0.123782529	1.112989204	0.064416692
002/0054000	1.701689331	0.281009461	0.152180241	0.123846917	1.05668246	0.087970252
002/0057000	1.811365705	0.281938008	0.152422239	0.123767548	1.182442292	0.070795618
002/0060000	1.590754212	0.280715832	0.152272502	0.123852259	0.938755342	0.095158276
003/0063000	1.694301318	0.281677891	0.152923163	0.123846188	1.035150585	0.100703492
003/0066000	1.776913233	0.280795548	0.152125085	0.123798378	1.159221432	0.06097279
003/0069000	1.677494783	0.280894953	0.152218416	0.123805393	1.054612989	0.065963031
003/0072000	1.744946063	0.280448665	0.151652434	0.123772598	1.13291674	0.056155626
003/0075000	1.639199969	0.279516858	0.151232612	0.123712711	1.000559818	0.084177969
003/0078000	1.584289033	0.280314603	0.151098529	0.123717046	0.975946653	0.053212203
004/0081000	1.682771206	0.279598393	0.151123654	0.123725113	1.06091101	0.067413037
004/0084000	1.831014415	0.279693096	0.151269177	0.12372141	1.222902658	0.053428073
004/0087000	1.70260199	0.279246934	0.151127964	0.123798049	1.083166839	0.065262205
004/0090000	1.715838219	0.279187355	0.150955118	0.123750427	1.080106237	0.081839083
004/0093000	1.630208055	0.278924158	0.150899347	0.123710658	1.006538785	0.070135107
004/0096000	1.669858153	0.278846762	0.151171283	0.123686614	1.059495821	0.056657671
004/0099000	1.605026624	0.279699607	0.150914517	0.123772777	0.969203022	0.081436702
005/0102000	1.627926335	0.278759098	0.150707689	0.123738482	1.002998026	0.07172304
005/0105000	1.668896989	0.278112128	0.150172103	0.12371902	1.051949169	0.064944569
005/0108000	1.768660465	0.278315071	0.150633142	0.123749939	1.157766967	0.058195345
005/0111000	1.53076223	0.278055794	0.150367878	0.12371528	0.911770425	0.066852853

zhou13 commented 4 years ago

Your setting is very different from the one in this repo. As least, your lcnn validates every 3000 images, while it should validate every 24k images by default. So this is not the default config.

www322 commented 4 years ago

Your setting is very different from the one in this repo. As least, your lcnn validates every 3000 images, while it should validate every 24k images by default. So this is not the default config.

yes, i change the validation interval, but i think it doen't matter with training loss. here is my config file: io: logdir: logs/ datadir: data/wireframes_lcnn/wireframe resume_from: num_workers: 4 tensorboard_port: 0 validation_interval: 3000

model: image: mean: [109.730, 103.832, 98.681] stddev: [22.275, 22.124, 23.229]

batch_size: 6 batch_size_eval: 2

# backbone multi-task parameters head_size: [[2], [1], [2]] loss_weight: jmap: 8.0 lmap: 0.5 joff: 0.25 lpos: 1 lneg: 1

# backbone parameters backbone: stacked_hourglass depth: 4 num_stacks: 2 num_blocks: 1

# sampler parameters ## static sampler n_stc_posl: 300 n_stc_negl: 40

## dynamic sampler n_dyn_junc: 300 n_dyn_posl: 300 n_dyn_negl: 80 n_dyn_othr: 600

# LOIPool layer parameters n_pts0: 32 n_pts1: 8

# line verification network parameters dim_loi: 128 dim_fc: 1024

# maximum junction and line outputs n_out_junc: 250 n_out_line: 2500

# additional ablation study parameters use_cood: 0 use_slop: 0 use_conv: 0

# junction threashold for evaluation (See #5) eval_junc_thres: 0.008

optim: name: Adam lr: 4.0e-4 amsgrad: True weight_decay: 1.0e-4 max_epoch: 24 lr_decay_epoch: 10

www322 commented 4 years ago

Your setting is very different from the one in this repo. As least, your lcnn validates every 3000 images, while it should validate every 24k images by default. So this is not the default config.

hello, i paste the config out, is there any problem? Need your help, thank you.

zhou13 commented 4 years ago

Sorry but the only suggestion I could give you is to restart clean and use the original config and code without ANY modification. Following the instruction strictly. People have reproduced it without issues. With the information you provide, I don't have any clues.

www322 commented 4 years ago

Sorry but the only suggestion I could give you is to restart clean and use the original config and code without ANY modification. Following the instruction strictly. People have reproduced it without issues. With the information you provide, I don't have any clues.

OK，thank you, I will check it carefully. Besides, did you train from scratch or finetune from a pretrained model?

zhou13 commented 4 years ago

I don't understand what finetune means here, as we don't have second dateset. I think README has been already very clear. Let me copy here:

You can download our reference pre-trained models from Google Drive. Those models were trained with config/wireframe.yaml for 312k iterations. Use demo.py, process.py, and eval-*.py to evaluate the pre-trained models.

www322 commented 4 years ago

I don't understand what finetune means here, as we don't have second dateset. I think README has been already very clear. Let me copy here:
You can download our reference pre-trained models from Google Drive. Those models were trained with config/wireframe.yaml for 312k iterations. Use demo.py, process.py, and eval-*.py to evaluate the pre-trained models.

Sorry, finetune here means training on other dataset before on wireframe. Now i know it, thanks very much.

zhou13 / lcnn

validation loss doesn't decrease #31