jnhwkim / MulLowBiVQA

Hadamard Product for Low-rank Bilinear Pooling
Other
71 stars 18 forks source link

Unable to reproduce performance #14

Closed a7b23 closed 6 years ago

a7b23 commented 6 years ago

Hi, I had downloaded the data_prepro.h5, data_prepro.json and seconds.json from the google drive link that you have shared. Also I had generated the data_res.h5 file by running prepro_res.lua. However on re-training the model by running train.lua(with the default parameters)and submitting the json file to the challenge server, I am getting an accuracy of only 50%. The json file generated from the pretrained model is achieving the desired accuracy of 65%.

Did you train the model with some different set of hyperparameters or am I making some mistake in training?

jnhwkim commented 6 years ago

Can you share your training log? And, please check that you're using this rnn package, https://github.com/Element-Research/rnn.

a7b23 commented 6 years ago

{ batch_size : 2 common_embedding_size : 1200 input_encoding_size : 620 learning_rate_decay_start : 0 input_seconds : "data_train-val_test-dev_2k/seconds.json" load_checkpoint_path : "" rnn_model : "GRU" vg_img_h5 : "" dropout : 0.5 previous_iters : 0 input_skip : "skipthoughts_model" label : "" gpuid : 0 optimizer : "rmsprop" input_img_h5 : "data_train-val_test-dev_2k/data_res.h5" rnn_size : 2400 vg : false model_name : "MLB" input_ques_h5 : "data_train-val_test-dev_2k/data_prepro.h5" kick_interval : 50000 seconds : true glimpse : 2 vg_ques_h5 : "" input_json : "data_train-val_test-dev_2k/data_prepro.json" num_layers : 1 num_output : 2000 iterPerEpoch : 120000 mhdf5_size : 10000 max_iters : 250000 checkpoint_path : "model/" save_checkpoint_every : 25000 learning_rate : 0.0003 clipping : 10 backend : "cudnn" seed : 1231 learning_rate_decay_every : 100 } DataLoader loading h5 file: data_train-val_test-dev_2k/data_prepro.h5
DataLoader loading h5 file: data_train-val_test-dev_2k/data_res.h5
Building the model...
MLB: No Shortcut
nn.Sequential { [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> output] (1): nn.ParallelTable { input |-> (1): nn.Identity -> (2): nn.Sequential { [input -> (1) -> (2) -> output] (1): nn.Transpose (2): nn.Reshape(196x2048) } ... -> output } (2): nn.ConcatTable { input |-> (1): nn.SelectTable(1) |-> (2): nn.SelectTable(2) -> (3): nn.Sequential { [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> output] (1): nn.ParallelTable { input |-> (1): nn.Sequential { | [input -> (1) -> (2) -> (3) -> (4) -> output] | (1): nn.Dropout(0.5, busy) | (2): nn.Linear(2400 -> 1200) | (3): nn.Tanh | (4): nn.Replicate | } -> (2): nn.Sequential { [input -> (1) -> (2) -> (3) -> (4) -> (5) -> output] (1): nn.Reshape(392x2048) (2): nn.Dropout(0.5, busy) (3): nn.Linear(2048 -> 1200) (4): nn.Tanh (5): nn.Reshape(2x196x1200) } ... -> output } (2): nn.CMulTable (3): nn.Reshape(2x14x14x1200) (4): nn.Transpose (5): nn.SpatialConvolution(1200 -> 2, 1x1) (6): nn.Reshape(2x2x196) (7): nn.SplitTable (8): nn.ParallelTable { input |-> (1): nn.SoftMax -> (2): nn.SoftMax ... -> output } } ... -> output } (3): nn.FlattenTable (4): nn.ConcatTable { input |-> (1): nn.Sequential { | [input -> (1) -> (2) -> (3) -> (4) -> output] | (1): nn.SelectTable(1) | (2): nn.Dropout(0.5, busy) | (3): nn.Linear(2400 -> 2400) | (4): nn.Tanh | } |-> (2): nn.Sequential { | [input -> (1) -> (2) -> output] | (1): nn.ConcatTable { | input | |-> (1): nn.Sequential { | | [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> output] | | (1): nn.ConcatTable { | | input | | |-> (1): nn.SelectTable(3) | |-> (2): nn.SelectTable(2) | | ... -> output | | } | | (2): nn.ParallelTable { | | input | | |-> (1): nn.Identity | |-> (2): nn.SplitTable | | ... -> output | | } | | (3): nn.MixtureTable | | (4): nn.Dropout(0.5, busy) | | (5): nn.Linear(2048 -> 1200) | | (6): nn.Tanh | | } | -> (2): nn.Sequential { | [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> output] | (1): nn.ConcatTable { | input | |-> (1): nn.SelectTable(4) | -> (2): nn.SelectTable(2) | ... -> output | } | (2): nn.ParallelTable { | input | |-> (1): nn.Identity | -> (2): nn.SplitTable | ... -> output | } | (3): nn.MixtureTable | (4): nn.Dropout(0.5, busy) | (5): nn.Linear(2048 -> 1200) | (6): nn.Tanh | } | ... -> output | } | (2): nn.JoinTable | } -> (3): nn.SelectTable(2) ... -> output } (5): nn.ConcatTable { input |-> (1): nn.Sequential { | [input -> (1) -> (2) -> output] | (1): nn.NarrowTable | (2): nn.CMulTable | } -> (2): nn.SelectTable(3) ... -> output } (6): nn.SelectTable(1) (7): nn.Dropout(0.5, busy) (8): nn.Linear(2400 -> 2000) } shipped data function to cuda...
nParams= 51894822
decay_factor = 0.99997592083
learining rate: 0.0003
training loss: 4.8693726433563 on iter: 100/250000 training loss: 4.0755410186514 on iter: 200/250000 training loss: 4.7666955601846 on iter: 300/250000 training loss: 4.1725922354561 on iter: 400/250000 training loss: 3.7588104538761 on iter: 500/250000 training loss: 4.1201743807505 on iter: 600/250000 training loss: 3.95045189864 on iter: 700/250000 training loss: 3.8068482422712 on iter: 800/250000 training loss: 3.741670871492 on iter: 900/250000 training loss: 3.8267296812402 on iter: 1000/250000
training loss: 4.0479970796246 on iter: 1100/250000
training loss: 4.1455298619561 on iter: 1200/250000
training loss: 4.0713808883637 on iter: 1300/250000
training loss: 3.9584978528876 on iter: 1400/250000
training loss: 3.8829934939005 on iter: 1500/250000
training loss: 3.3697245319676 on iter: 1600/250000
training loss: 4.382460173037 on iter: 1700/250000
training loss: 3.5356175303366 on iter: 1800/250000
training loss: 4.1210829661098 on iter: 1900/250000
training loss: 3.2177009764106 on iter: 2000/250000
training loss: 3.9065231957854 on iter: 2100/250000
training loss: 4.1072556528939 on iter: 2200/250000
training loss: 4.0144710805711 on iter: 2300/250000
training loss: 3.7349430310903 on iter: 2400/250000
training loss: 3.1987197406927 on iter: 2500/250000
training loss: 3.8432964374996 on iter: 2600/250000
training loss: 3.7092187217176 on iter: 2700/250000
training loss: 3.5008688446065 on iter: 2800/250000
training loss: 4.6774473157799 on iter: 2900/250000
training loss: 4.1848083823068 on iter: 3000/250000
training loss: 3.5860899373431 on iter: 3100/250000
training loss: 2.9383213166334 on iter: 3200/250000
training loss: 3.0839858097203 on iter: 3300/250000
training loss: 3.7513172062108 on iter: 3400/250000
training loss: 3.1869710831699 on iter: 3500/250000
training loss: 3.624473524439 on iter: 3600/250000
training loss: 3.4050613941417 on iter: 3700/250000
training loss: 3.736372573513 on iter: 3800/250000
training loss: 3.2235076301945 on iter: 3900/250000
training loss: 3.5757938307821 on iter: 4000/250000
training loss: 3.3135292969846 on iter: 4100/250000
training loss: 2.6682984935174 on iter: 4200/250000
training loss: 3.7384815510971 on iter: 4300/250000
training loss: 3.4148914840276 on iter: 4400/250000
training loss: 3.6498642060847 on iter: 4500/250000
training loss: 3.6433645158231 on iter: 4600/250000
training loss: 3.7100133600744 on iter: 4700/250000
training loss: 2.8654981825852 on iter: 4800/250000
training loss: 3.217280009943 on iter: 4900/250000
training loss: 3.9207967604785 on iter: 5000/250000
training loss: 2.9555742654138 on iter: 5100/250000
training loss: 3.8598746884621 on iter: 5200/250000
training loss: 3.5847453037522 on iter: 5300/250000
training loss: 3.4657520898438 on iter: 5400/250000
training loss: 3.3916345924177 on iter: 5500/250000
training loss: 3.8579615059769 on iter: 5600/250000
training loss: 3.9038641790336 on iter: 5700/250000
training loss: 4.3864395398832 on iter: 5800/250000
training loss: 3.4928268161193 on iter: 5900/250000
training loss: 3.0450249645281 on iter: 6000/250000
training loss: 3.8842221132494 on iter: 6100/250000
training loss: 3.2880487401061 on iter: 6200/250000
training loss: 2.9185751451314 on iter: 6300/250000
training loss: 3.0740059996772 on iter: 6400/250000
training loss: 3.0148220994581 on iter: 6500/250000
training loss: 3.58458690508 on iter: 6600/250000
training loss: 3.2511303354174 on iter: 6700/250000
training loss: 2.971753350322 on iter: 6800/250000
training loss: 2.907227503617 on iter: 6900/250000
training loss: 2.7177727135474 on iter: 7000/250000
training loss: 3.4304250191041 on iter: 7100/250000
training loss: 3.9063394142783 on iter: 7200/250000
training loss: 3.5187849466658 on iter: 7300/250000
training loss: 3.2528221448543 on iter: 7400/250000
training loss: 2.7372836095832 on iter: 7500/250000
training loss: 3.6240937975306 on iter: 7600/250000
training loss: 3.1945240727705 on iter: 7700/250000
training loss: 2.9268224073938 on iter: 7800/250000
training loss: 3.5500002281539 on iter: 7900/250000
training loss: 2.7849560622056 on iter: 8000/250000
training loss: 2.8870666174859 on iter: 8100/250000
training loss: 3.5429703615977 on iter: 8200/250000
training loss: 3.3446548740357 on iter: 8300/250000
training loss: 3.7808576446444 on iter: 8400/250000
training loss: 2.8902860535067 on iter: 8500/250000
training loss: 3.9734762724497 on iter: 8600/250000
training loss: 3.4776270912936 on iter: 8700/250000
training loss: 2.9827175335349 on iter: 8800/250000
training loss: 3.699993023318 on iter: 8900/250000
training loss: 3.4213773127606 on iter: 9000/250000
training loss: 3.1046835439853 on iter: 9100/250000
training loss: 3.540458070569 on iter: 9200/250000
training loss: 3.2517778932949 on iter: 9300/250000
training loss: 3.2988247027024 on iter: 9400/250000
training loss: 2.710372349899 on iter: 9500/250000
training loss: 3.4521389221567 on iter: 9600/250000
training loss: 3.9458913853249 on iter: 9700/250000
training loss: 3.1495192185632 on iter: 9800/250000
training loss: 3.7107258838663 on iter: 9900/250000
training loss: 3.2706948892924 on iter: 10000/250000
training loss: 2.9775978381441 on iter: 10100/250000
training loss: 3.3944526472107 on iter: 10200/250000
training loss: 3.9131680883825 on iter: 10300/250000
training loss: 3.2496365447769 on iter: 10400/250000
training loss: 2.9414709037481 on iter: 10500/250000
training loss: 3.0687932799818 on iter: 10600/250000
training loss: 3.528188690943 on iter: 10700/250000
training loss: 3.2771711537859 on iter: 10800/250000
training loss: 3.1210666952226 on iter: 10900/250000
training loss: 3.35382299998 on iter: 11000/250000
training loss: 3.1272502926084 on iter: 11100/250000
training loss: 3.0141314469646 on iter: 11200/250000
training loss: 2.1994255576717 on iter: 11300/250000
training loss: 3.4505864127043 on iter: 11400/250000
training loss: 3.0549549939623 on iter: 11500/250000
training loss: 3.8655790047206 on iter: 11600/250000
training loss: 3.6181578718816 on iter: 11700/250000
training loss: 3.1614834674546 on iter: 11800/250000
training loss: 3.0445825487965 on iter: 11900/250000
training loss: 3.1182394490669 on iter: 12000/250000
training loss: 2.8396849742938 on iter: 12100/250000
training loss: 3.0825098473937 on iter: 12200/250000
training loss: 2.8717748775381 on iter: 12300/250000
training loss: 3.2389117511133 on iter: 12400/250000
training loss: 3.1350613412558 on iter: 12500/250000
training loss: 3.1720919024382 on iter: 12600/250000
training loss: 2.843639413933 on iter: 12700/250000
training loss: 3.2418760803871 on iter: 12800/250000
training loss: 3.0901308583901 on iter: 12900/250000
training loss: 3.2590009009524 on iter: 13000/250000
training loss: 2.7563787665588 on iter: 13100/250000
training loss: 3.0304985252942 on iter: 13200/250000
training loss: 2.7065686378805 on iter: 13300/250000
training loss: 3.3563562597361 on iter: 13400/250000
training loss: 2.9093297888355 on iter: 13500/250000
training loss: 2.9764109331449 on iter: 13600/250000
training loss: 2.9167921855305 on iter: 13700/250000
training loss: 3.6432441644402 on iter: 13800/250000
training loss: 2.5046723125632 on iter: 13900/250000
training loss: 3.0008588441858 on iter: 14000/250000
training loss: 3.011089685378 on iter: 14100/250000
training loss: 3.7472023018877 on iter: 14200/250000
training loss: 2.5692966813257 on iter: 14300/250000
training loss: 3.1719729910237 on iter: 14400/250000
training loss: 3.0317613558285 on iter: 14500/250000
training loss: 2.7482626584263 on iter: 14600/250000
training loss: 2.9293070169308 on iter: 14700/250000
training loss: 3.3509049303604 on iter: 14800/250000
training loss: 3.4324509253248 on iter: 14900/250000
training loss: 2.7452546194139 on iter: 15000/250000
training loss: 3.2008311045955 on iter: 15100/250000
training loss: 3.219244336545 on iter: 15200/250000
training loss: 3.2872278345276 on iter: 15300/250000
training loss: 3.2195236941911 on iter: 15400/250000
training loss: 3.2281098853581 on iter: 15500/250000
training loss: 3.4251679423988 on iter: 15600/250000
training loss: 2.830743881681 on iter: 15700/250000
training loss: 3.4418592989672 on iter: 15800/250000
training loss: 2.9570411896901 on iter: 15900/250000
training loss: 2.9305634234001 on iter: 16000/250000
training loss: 3.2423435593587 on iter: 16100/250000
training loss: 2.6633830285309 on iter: 16200/250000
training loss: 3.3764463189316 on iter: 16300/250000
training loss: 3.1167515878543 on iter: 16400/250000
training loss: 2.9797131342018 on iter: 16500/250000
training loss: 3.3640001957979 on iter: 16600/250000
training loss: 2.9842229140313 on iter: 16700/250000
training loss: 2.9349141462539 on iter: 16800/250000
training loss: 2.9421552038215 on iter: 16900/250000
training loss: 3.1588174941052 on iter: 17000/250000
training loss: 3.1636286831675 on iter: 17100/250000
training loss: 3.3752112876663 on iter: 17200/250000
training loss: 2.7049184432757 on iter: 17300/250000
training loss: 3.3542719029892 on iter: 17400/250000
training loss: 3.2876799176369 on iter: 17500/250000
training loss: 3.1450917912788 on iter: 17600/250000
training loss: 3.2181256066894 on iter: 17700/250000
training loss: 3.0196555049743 on iter: 17800/250000
training loss: 3.5895030486964 on iter: 17900/250000
training loss: 3.1119935808024 on iter: 18000/250000
training loss: 3.6554815055515 on iter: 18100/250000
training loss: 3.0559010261442 on iter: 18200/250000
training loss: 3.1836163580029 on iter: 18300/250000
training loss: 2.8963457020751 on iter: 18400/250000
training loss: 3.1694013043862 on iter: 18500/250000
training loss: 3.5502405222109 on iter: 18600/250000
training loss: 2.6263777880011 on iter: 18700/250000
training loss: 2.6189622432943 on iter: 18800/250000
training loss: 3.0212086331832 on iter: 18900/250000
training loss: 3.1764373664321 on iter: 19000/250000
training loss: 3.3414628431327 on iter: 19100/250000
training loss: 3.2404355441707 on iter: 19200/250000
training loss: 3.2293816091282 on iter: 19300/250000
training loss: 3.2193897891727 on iter: 19400/250000
training loss: 3.0912890248628 on iter: 19500/250000
training loss: 3.4781617757879 on iter: 19600/250000
training loss: 3.3243259346857 on iter: 19700/250000
training loss: 2.5715496291489 on iter: 19800/250000
training loss: 3.4017144645094 on iter: 19900/250000
training loss: 2.8807517725605 on iter: 20000/250000
training loss: 3.3015500497401 on iter: 20100/250000
training loss: 2.6950072619071 on iter: 20200/250000
training loss: 2.8330123886692 on iter: 20300/250000
training loss: 3.4885240178146 on iter: 20400/250000
training loss: 3.2769468335815 on iter: 20500/250000
training loss: 3.1374565907301 on iter: 20600/250000
training loss: 3.7980738535804 on iter: 20700/250000
training loss: 3.3538743955902 on iter: 20800/250000
training loss: 3.2575119784081 on iter: 20900/250000
training loss: 2.8834165686189 on iter: 21000/250000
training loss: 3.4384198095179 on iter: 21100/250000
training loss: 3.2579521933942 on iter: 21200/250000
training loss: 3.2417392824251 on iter: 21300/250000
training loss: 3.3312448621918 on iter: 21400/250000
training loss: 2.990379253637 on iter: 21500/250000
training loss: 3.4543052471929 on iter: 21600/250000
training loss: 2.6592379707315 on iter: 21700/250000
training loss: 3.1453652803984 on iter: 21800/250000
training loss: 2.7971247188701 on iter: 21900/250000
training loss: 2.8413442890739 on iter: 22000/250000
training loss: 2.5744051844803 on iter: 22100/250000
training loss: 2.5189899570808 on iter: 22200/250000
training loss: 2.6232365677646 on iter: 22300/250000
training loss: 3.6171616964174 on iter: 22400/250000
training loss: 2.6497101692725 on iter: 22500/250000
training loss: 3.2365120220182 on iter: 22600/250000
training loss: 3.6583070155128 on iter: 22700/250000
training loss: 3.1400776131364 on iter: 22800/250000
training loss: 3.4458716561465 on iter: 22900/250000
training loss: 2.9104094067861 on iter: 23000/250000
training loss: 2.9104067385251 on iter: 23100/250000
training loss: 3.2661380485108 on iter: 23200/250000
training loss: 2.6968539340268 on iter: 23300/250000
training loss: 3.2607974878456 on iter: 23400/250000
training loss: 3.2548830323751 on iter: 23500/250000
training loss: 2.6086769734632 on iter: 23600/250000
training loss: 2.6429898930076 on iter: 23700/250000
training loss: 3.533971672395 on iter: 23800/250000
training loss: 2.5116789161084 on iter: 23900/250000
training loss: 3.0228113237291 on iter: 24000/250000
training loss: 2.8585392572753 on iter: 24100/250000
training loss: 3.0625396872499 on iter: 24200/250000
training loss: 2.6879629439801 on iter: 24300/250000
training loss: 2.5624721268646 on iter: 24400/250000
training loss: 3.0780126949095 on iter: 24500/250000
training loss: 2.8161702155686 on iter: 24600/250000
training loss: 2.7655067865491 on iter: 24700/250000
training loss: 3.1447250298949 on iter: 24800/250000
training loss: 2.6643519266675 on iter: 24900/250000
training loss: 3.1405527531231 on iter: 25000/250000
training loss: 2.8240664630491 on iter: 25100/250000
training loss: 3.2591791792974 on iter: 25200/250000
training loss: 3.0373734209679 on iter: 25300/250000
training loss: 3.2993201249172 on iter: 25400/250000
training loss: 2.9241576340796 on iter: 25500/250000
training loss: 3.4208655103621 on iter: 25600/250000
training loss: 2.6986180118537 on iter: 25700/250000
training loss: 3.1083747618008 on iter: 25800/250000
training loss: 3.0968884944888 on iter: 25900/250000
training loss: 2.5730109123527 on iter: 26000/250000
training loss: 2.9652333260898 on iter: 26100/250000
training loss: 3.106467940841 on iter: 26200/250000
training loss: 2.889385309368 on iter: 26300/250000
training loss: 2.8754751483936 on iter: 26400/250000
training loss: 3.4690856744276 on iter: 26500/250000
training loss: 2.6626066537817 on iter: 26600/250000
training loss: 3.693138713736 on iter: 26700/250000
training loss: 3.3655539838607 on iter: 26800/250000
training loss: 2.4135486124885 on iter: 26900/250000
training loss: 2.7263093759357 on iter: 27000/250000
training loss: 2.7154404199844 on iter: 27100/250000
training loss: 2.551790252989 on iter: 27200/250000
training loss: 2.7817980669283 on iter: 27300/250000
training loss: 3.032612377084 on iter: 27400/250000
training loss: 2.8645624813285 on iter: 27500/250000
training loss: 2.3777953999209 on iter: 27600/250000
training loss: 2.9842303171256 on iter: 27700/250000
training loss: 2.3908186542124 on iter: 27800/250000
training loss: 2.6519880274652 on iter: 27900/250000
training loss: 3.2047579907545 on iter: 28000/250000
training loss: 2.9475471151639 on iter: 28100/250000
training loss: 2.5323053093292 on iter: 28200/250000
training loss: 2.7037444801731 on iter: 28300/250000
training loss: 3.7067378552588 on iter: 28400/250000
training loss: 3.0086043325106 on iter: 28500/250000
training loss: 2.8878485085798 on iter: 28600/250000
training loss: 2.9867025578358 on iter: 28700/250000
training loss: 2.8323197593272 on iter: 28800/250000
training loss: 2.5924800263649 on iter: 28900/250000
training loss: 3.1707279862244 on iter: 29000/250000
training loss: 2.8470747701881 on iter: 29100/250000
training loss: 2.9111539048573 on iter: 29200/250000
training loss: 2.2627088660232 on iter: 29300/250000
training loss: 2.5571346068465 on iter: 29400/250000
training loss: 2.6449880883746 on iter: 29500/250000
training loss: 3.1094094366272 on iter: 29600/250000
training loss: 3.0538276190098 on iter: 29700/250000
training loss: 3.2142187911395 on iter: 29800/250000
training loss: 2.6269572943713 on iter: 29900/250000
training loss: 2.7888394134094 on iter: 30000/250000
training loss: 3.3249958810677 on iter: 30100/250000
training loss: 2.3240902474375 on iter: 30200/250000
training loss: 3.3579108847261 on iter: 30300/250000
training loss: 3.2161314850033 on iter: 30400/250000
training loss: 3.3103784601952 on iter: 30500/250000
training loss: 3.2584265875949 on iter: 30600/250000
training loss: 3.0295679705164 on iter: 30700/250000
training loss: 2.5405019778261 on iter: 30800/250000
training loss: 2.8883628275649 on iter: 30900/250000
training loss: 3.156979906038 on iter: 31000/250000
training loss: 3.3243593159314 on iter: 31100/250000
training loss: 2.9221167958726 on iter: 31200/250000
training loss: 2.4022412857158 on iter: 31300/250000
training loss: 2.4823977543017 on iter: 31400/250000
training loss: 3.1635561593958 on iter: 31500/250000
training loss: 2.7867395250907 on iter: 31600/250000
training loss: 2.8212026335174 on iter: 31700/250000
training loss: 2.5491165642515 on iter: 31800/250000
training loss: 1.9745845448584 on iter: 31900/250000
training loss: 2.5993409826297 on iter: 32000/250000
training loss: 3.002586099949 on iter: 32100/250000
training loss: 2.4849100482274 on iter: 32200/250000
training loss: 2.9782041320016 on iter: 32300/250000
training loss: 2.8248138720943 on iter: 32400/250000
training loss: 3.3294764278592 on iter: 32500/250000
training loss: 3.3369331493405 on iter: 32600/250000
training loss: 2.7598539820267 on iter: 32700/250000
training loss: 3.3208098564582 on iter: 32800/250000
training loss: 3.2498726176156 on iter: 32900/250000
training loss: 2.8864562423817 on iter: 33000/250000
training loss: 2.5959523809067 on iter: 33100/250000
training loss: 2.9920279189924 on iter: 33200/250000
training loss: 3.4556683073173 on iter: 33300/250000
training loss: 3.028779018419 on iter: 33400/250000
training loss: 2.7479328880365 on iter: 33500/250000
training loss: 3.407427632183 on iter: 33600/250000
training loss: 3.0322574660951 on iter: 33700/250000
training loss: 2.8570670392355 on iter: 33800/250000
training loss: 3.2030911648047 on iter: 33900/250000
training loss: 3.0942246064616 on iter: 34000/250000
training loss: 3.1912913184401 on iter: 34100/250000
training loss: 2.8534122352894 on iter: 34200/250000
training loss: 2.7972144633762 on iter: 34300/250000
training loss: 3.6274471583914 on iter: 34400/250000
training loss: 3.0430058132834 on iter: 34500/250000
training loss: 3.2025162238992 on iter: 34600/250000
training loss: 2.7894221746726 on iter: 34700/250000
training loss: 3.1981873895599 on iter: 34800/250000
training loss: 2.6263389024415 on iter: 34900/250000
training loss: 3.346883854562 on iter: 35000/250000
training loss: 2.7505133972234 on iter: 35100/250000
training loss: 2.6575271302295 on iter: 35200/250000
training loss: 2.1933454271242 on iter: 35300/250000
training loss: 2.2448134552074 on iter: 35400/250000
training loss: 2.4995778288307 on iter: 35500/250000
training loss: 3.7240212729168 on iter: 35600/250000
training loss: 3.1065201740652 on iter: 35700/250000
training loss: 2.9089732985142 on iter: 35800/250000
training loss: 2.1044293994932 on iter: 35900/250000
training loss: 2.2188602078087 on iter: 36000/250000
training loss: 3.1015161834776 on iter: 36100/250000
training loss: 3.6836881263824 on iter: 36200/250000
training loss: 2.7493086099768 on iter: 36300/250000
training loss: 2.5318635495157 on iter: 36400/250000
training loss: 3.01264194735 on iter: 36500/250000
training loss: 2.951277122908 on iter: 36600/250000
training loss: 2.5653159018485 on iter: 36700/250000
training loss: 2.997954181701 on iter: 36800/250000
training loss: 3.3420850794446 on iter: 36900/250000
training loss: 2.7062056144825 on iter: 37000/250000
training loss: 2.6751346963219 on iter: 37100/250000
training loss: 3.4851508607656 on iter: 37200/250000
training loss: 3.324443849895 on iter: 37300/250000
training loss: 2.7772340143259 on iter: 37400/250000
training loss: 2.8627652276409 on iter: 37500/250000
training loss: 3.1176290939467 on iter: 37600/250000
training loss: 2.8095328415567 on iter: 37700/250000
training loss: 2.4254056645494 on iter: 37800/250000
training loss: 2.4594393509266 on iter: 37900/250000
training loss: 3.3197724617349 on iter: 38000/250000
training loss: 2.895093081814 on iter: 38100/250000
training loss: 2.7091126372623 on iter: 38200/250000
training loss: 2.9800662538717 on iter: 38300/250000
training loss: 3.1934758233596 on iter: 38400/250000
training loss: 2.9273412411394 on iter: 38500/250000
training loss: 2.7778064830738 on iter: 38600/250000
training loss: 2.9012858606713 on iter: 38700/250000
training loss: 3.166392873428 on iter: 38800/250000
training loss: 2.1570321095881 on iter: 38900/250000
training loss: 2.5526920155335 on iter: 39000/250000
training loss: 2.8208426708314 on iter: 39100/250000
training loss: 3.2052770069977 on iter: 39200/250000
training loss: 3.3753125780917 on iter: 39300/250000
training loss: 3.158969717612 on iter: 39400/250000
training loss: 2.6212700302726 on iter: 39500/250000
training loss: 3.1128000664981 on iter: 39600/250000
training loss: 3.10789489616 on iter: 39700/250000
training loss: 2.8534272416057 on iter: 39800/250000
training loss: 2.7241283894475 on iter: 39900/250000
training loss: 2.1668711832415 on iter: 40000/250000
training loss: 2.7931482708619 on iter: 40100/250000
training loss: 2.5422958466249 on iter: 40200/250000
training loss: 2.7831317845305 on iter: 40300/250000
training loss: 3.2608394385049 on iter: 40400/250000
training loss: 3.1551828620188 on iter: 40500/250000
training loss: 2.7071851100367 on iter: 40600/250000
training loss: 2.6449025099827 on iter: 40700/250000
training loss: 2.9257500981731 on iter: 40800/250000
training loss: 2.8167223081033 on iter: 40900/250000
training loss: 2.5398481681439 on iter: 41000/250000
training loss: 3.3109904565936 on iter: 41100/250000
training loss: 3.0244976672865 on iter: 41200/250000
training loss: 3.2261034304039 on iter: 41300/250000
training loss: 3.3421393852847 on iter: 41400/250000
training loss: 2.658968246898 on iter: 41500/250000
training loss: 2.368541228227 on iter: 41600/250000
training loss: 3.0218014445214 on iter: 41700/250000
training loss: 3.1160808140738 on iter: 41800/250000
training loss: 2.6360564566911 on iter: 41900/250000
training loss: 2.3966052225207 on iter: 42000/250000
training loss: 3.3291244522373 on iter: 42100/250000
training loss: 3.2903659136894 on iter: 42200/250000
training loss: 2.7074324931238 on iter: 42300/250000
training loss: 2.9263368820822 on iter: 42400/250000
training loss: 3.0633604169517 on iter: 42500/250000
training loss: 3.2471629710388 on iter: 42600/250000
training loss: 2.5533603983519 on iter: 42700/250000
training loss: 2.616530287249 on iter: 42800/250000
training loss: 3.6041385093687 on iter: 42900/250000
training loss: 3.3307375198414 on iter: 43000/250000
training loss: 3.4292122592299 on iter: 43100/250000
training loss: 3.4907922405168 on iter: 43200/250000
training loss: 3.0338559101893 on iter: 43300/250000
training loss: 3.0653447333173 on iter: 43400/250000
training loss: 3.1978451693024 on iter: 43500/250000
training loss: 3.0900992756311 on iter: 43600/250000
training loss: 3.3162743509432 on iter: 43700/250000
training loss: 3.5071853172012 on iter: 43800/250000
training loss: 2.4317395999511 on iter: 43900/250000
training loss: 3.2084076053427 on iter: 44000/250000
training loss: 3.3483859018456 on iter: 44100/250000
training loss: 2.5174978677158 on iter: 44200/250000
training loss: 2.9167060854518 on iter: 44300/250000
training loss: 2.7845944168198 on iter: 44400/250000
training loss: 3.1594244152385 on iter: 44500/250000
training loss: 2.983772731962 on iter: 44600/250000
training loss: 2.5752059344433 on iter: 44700/250000
training loss: 2.5584736965249 on iter: 44800/250000
training loss: 2.6382590652339 on iter: 44900/250000
training loss: 2.7907793728873 on iter: 45000/250000
training loss: 2.6300144369619 on iter: 45100/250000
training loss: 2.407163148518 on iter: 45200/250000
training loss: 2.8119192752524 on iter: 45300/250000
training loss: 3.1512034075037 on iter: 45400/250000
training loss: 3.1743895321618 on iter: 45500/250000
training loss: 3.5992599159344 on iter: 45600/250000
training loss: 2.8110358833605 on iter: 45700/250000
training loss: 2.5396199482666 on iter: 45800/250000
training loss: 3.2330021439869 on iter: 45900/250000
training loss: 3.1042458884371 on iter: 46000/250000
training loss: 3.2680115354797 on iter: 46100/250000
training loss: 3.0620634476616 on iter: 46200/250000
training loss: 2.5693489043521 on iter: 46300/250000
training loss: 3.3221335422228 on iter: 46400/250000
training loss: 2.8385716497321 on iter: 46500/250000
training loss: 3.3701882734479 on iter: 46600/250000
training loss: 1.8712476351257 on iter: 46700/250000
training loss: 3.3493705605856 on iter: 46800/250000
training loss: 2.4429897726369 on iter: 46900/250000
training loss: 2.6455201845129 on iter: 47000/250000
training loss: 3.2966350328845 on iter: 47100/250000
training loss: 2.9383092031553 on iter: 47200/250000
training loss: 2.5062020613761 on iter: 47300/250000
training loss: 3.0695962221334 on iter: 47400/250000
training loss: 3.3122460379691 on iter: 47500/250000
training loss: 2.9600613574456 on iter: 47600/250000
training loss: 3.3094171036278 on iter: 47700/250000
training loss: 2.7381019660127 on iter: 47800/250000
training loss: 3.0401055938782 on iter: 47900/250000
training loss: 2.3997073984432 on iter: 48000/250000
training loss: 2.6164732820399 on iter: 48100/250000
training loss: 3.2026317104463 on iter: 48200/250000
training loss: 2.5061495752986 on iter: 48300/250000
training loss: 2.7231252243719 on iter: 48400/250000
training loss: 3.2493738675573 on iter: 48500/250000
training loss: 2.4462431523864 on iter: 48600/250000
training loss: 2.9212372545115 on iter: 48700/250000
training loss: 2.9847969553022 on iter: 48800/250000
training loss: 2.9501151660417 on iter: 48900/250000
training loss: 2.7569659016263 on iter: 49000/250000
training loss: 2.3480913740908 on iter: 49100/250000
training loss: 3.0163933541744 on iter: 49200/250000
training loss: 2.7167207869726 on iter: 49300/250000
training loss: 3.2069934467285 on iter: 49400/250000
training loss: 2.7863701021643 on iter: 49500/250000
training loss: 2.7587296057581 on iter: 49600/250000
training loss: 2.6402283877969 on iter: 49700/250000
training loss: 3.156670467286 on iter: 49800/250000
training loss: 3.1014260299928 on iter: 49900/250000
training loss: 3.0707723669787 on iter: 50000/250000
learining rate: 0.00018000429996178 training loss: 2.9826951151979 on iter: 50100/250000
training loss: 2.8695361532492 on iter: 50200/250000
training loss: 3.3247373019382 on iter: 50300/250000
training loss: 3.1328238853215 on iter: 50400/250000
training loss: 2.5897246550552 on iter: 50500/250000
training loss: 3.7448340745144 on iter: 50600/250000
training loss: 2.8824503279683 on iter: 50700/250000
training loss: 2.6859176795303 on iter: 50800/250000
training loss: 2.6547332435929 on iter: 50900/250000
training loss: 3.0189399575269 on iter: 51000/250000
training loss: 3.1493902044079 on iter: 51100/250000
training loss: 3.0375422919537 on iter: 51200/250000
training loss: 2.6195081558781 on iter: 51300/250000
training loss: 2.8358699097512 on iter: 51400/250000
training loss: 2.7570889452629 on iter: 51500/250000
training loss: 3.478781002872 on iter: 51600/250000
training loss: 2.6494430457571 on iter: 51700/250000
training loss: 3.1594810498538 on iter: 51800/250000
training loss: 2.3492261993111 on iter: 51900/250000
training loss: 2.9185940559061 on iter: 52000/250000
training loss: 4.0521185699998 on iter: 52100/250000
training loss: 3.4516031220764 on iter: 52200/250000
training loss: 2.8505002632043 on iter: 52300/250000
training loss: 2.7739651482366 on iter: 52400/250000
training loss: 2.5244510753311 on iter: 52500/250000
training loss: 2.7624893480496 on iter: 52600/250000
training loss: 2.9913573142979 on iter: 52700/250000
training loss: 2.4793100606248 on iter: 52800/250000
training loss: 3.0336866261159 on iter: 52900/250000
training loss: 3.404714977709 on iter: 53000/250000
training loss: 3.0290627516123 on iter: 53100/250000
training loss: 3.1928037740917 on iter: 53200/250000
training loss: 3.2476352743402 on iter: 53300/250000
training loss: 2.6967785476726 on iter: 53400/250000
training loss: 2.4209349035244 on iter: 53500/250000
training loss: 2.8452198188297 on iter: 53600/250000
training loss: 2.4684724386732 on iter: 53700/250000
training loss: 3.5310061366924 on iter: 53800/250000
training loss: 2.7964644465727 on iter: 53900/250000
training loss: 3.4393585568651 on iter: 54000/250000
training loss: 3.2504298062017 on iter: 54100/250000
training loss: 3.0453870621538 on iter: 54200/250000
training loss: 2.5750210401611 on iter: 54300/250000
training loss: 3.1335563119142 on iter: 54400/250000
training loss: 3.2272489492582 on iter: 54500/250000
training loss: 2.6860665339941 on iter: 54600/250000
training loss: 2.8455795470557 on iter: 54700/250000
training loss: 2.8050024987435 on iter: 54800/250000
training loss: 2.4481127971162 on iter: 54900/250000
training loss: 3.1897213816638 on iter: 55000/250000
training loss: 2.6587693715 on iter: 55100/250000
training loss: 2.605828933971 on iter: 55200/250000
training loss: 3.1640614801221 on iter: 55300/250000
training loss: 2.8783736682168 on iter: 55400/250000
training loss: 2.6427198870244 on iter: 55500/250000
training loss: 3.1458731234634 on iter: 55600/250000
training loss: 3.2361596267457 on iter: 55700/250000
training loss: 2.6743929343205 on iter: 55800/250000
training loss: 3.2731817680049 on iter: 55900/250000
training loss: 2.8112355243631 on iter: 56000/250000
training loss: 3.568056621572 on iter: 56100/250000
training loss: 2.6502633687674 on iter: 56200/250000
training loss: 2.9544126035189 on iter: 56300/250000
training loss: 3.6956327694954 on iter: 56400/250000
training loss: 2.537273512442 on iter: 56500/250000
training loss: 2.3139102896056 on iter: 56600/250000
training loss: 2.8545538928872 on iter: 56700/250000
training loss: 2.5670895218462 on iter: 56800/250000
training loss: 2.8794426485755 on iter: 56900/250000
training loss: 3.2846840091768 on iter: 57000/250000
training loss: 3.0623465544199 on iter: 57100/250000
training loss: 3.67607296062 on iter: 57200/250000
training loss: 3.3622431219602 on iter: 57300/250000
training loss: 2.8135809325525 on iter: 57400/250000
training loss: 2.5007720046486 on iter: 57500/250000
training loss: 3.0946373751104 on iter: 57600/250000
training loss: 3.2585399239142 on iter: 57700/250000
training loss: 2.8307698715608 on iter: 57800/250000
training loss: 3.1701312908077 on iter: 57900/250000
training loss: 2.7599415519429 on iter: 58000/250000
training loss: 3.3744652260144 on iter: 58100/250000
training loss: 3.4872125350232 on iter: 58200/250000
training loss: 3.0166415978538 on iter: 58300/250000
training loss: 3.0133274318937 on iter: 58400/250000
training loss: 1.9453353745444 on iter: 58500/250000
training loss: 3.0910007167371 on iter: 58600/250000
training loss: 3.2239646667555 on iter: 58700/250000
training loss: 2.9830185482923 on iter: 58800/250000
training loss: 3.0780200110967 on iter: 58900/250000
training loss: 2.5085090241504 on iter: 59000/250000
training loss: 3.029117557161 on iter: 59100/250000
training loss: 2.8099250720308 on iter: 59200/250000
training loss: 2.740815904032 on iter: 59300/250000
training loss: 3.4930301800565 on iter: 59400/250000
training loss: 3.1724187769694 on iter: 59500/250000
training loss: 3.0695920405288 on iter: 59600/250000
training loss: 2.9850761357121 on iter: 59700/250000
training loss: 2.9848018521725 on iter: 59800/250000
training loss: 2.639988137982 on iter: 59900/250000
training loss: 3.0181104285322 on iter: 60000/250000
training loss: 2.7710719248999 on iter: 60100/250000
training loss: 3.3530960537512 on iter: 60200/250000
training loss: 3.3115733933445 on iter: 60300/250000
training loss: 2.7605678467803 on iter: 60400/250000
training loss: 2.6251820118973 on iter: 60500/250000
training loss: 2.8969516717712 on iter: 60600/250000
training loss: 3.5016405804133 on iter: 60700/250000
training loss: 2.5762276602576 on iter: 60800/250000
training loss: 2.5986684338719 on iter: 60900/250000
training loss: 3.3459954258255 on iter: 61000/250000
training loss: 3.0472167682526 on iter: 61100/250000
training loss: 3.3193574147296 on iter: 61200/250000
training loss: 3.0682299213191 on iter: 61300/250000
training loss: 2.9830966701049 on iter: 61400/250000
training loss: 3.1305542254637 on iter: 61500/250000
training loss: 2.7577775071449 on iter: 61600/250000
training loss: 2.6724094615035 on iter: 61700/250000
training loss: 2.4385089079234 on iter: 61800/250000
training loss: 2.5126660310357 on iter: 61900/250000
training loss: 2.543611411866 on iter: 62000/250000
training loss: 3.3100760090423 on iter: 62100/250000
training loss: 2.7157881223243 on iter: 62200/250000
training loss: 3.0054993503107 on iter: 62300/250000
training loss: 2.9120837234232 on iter: 62400/250000
training loss: 2.7678028996828 on iter: 62500/250000
training loss: 3.2582168669495 on iter: 62600/250000
training loss: 2.7085406638655 on iter: 62700/250000
training loss: 3.5781126167857 on iter: 62800/250000
training loss: 3.0884174232502 on iter: 62900/250000
training loss: 2.8362335311594 on iter: 63000/250000
training loss: 2.732094537556 on iter: 63100/250000
training loss: 3.181310105201 on iter: 63200/250000
training loss: 2.9871917627265 on iter: 63300/250000
training loss: 2.94408443431 on iter: 63400/250000
training loss: 3.0737079111913 on iter: 63500/250000
training loss: 2.6107184414519 on iter: 63600/250000
training loss: 3.1083229924476 on iter: 63700/250000
training loss: 3.1784012872175 on iter: 63800/250000
training loss: 3.533237610381 on iter: 63900/250000
training loss: 2.7514176690915 on iter: 64000/250000
training loss: 3.0975480749125 on iter: 64100/250000
training loss: 2.7239186309335 on iter: 64200/250000
training loss: 2.2746646783646 on iter: 64300/250000
training loss: 2.9048258886354 on iter: 64400/250000
training loss: 3.0920490435751 on iter: 64500/250000
training loss: 3.878257223996 on iter: 64600/250000
training loss: 3.2526426176853 on iter: 64700/250000
training loss: 3.1041054207025 on iter: 64800/250000
training loss: 3.5939834330549 on iter: 64900/250000
training loss: 3.4393936402193 on iter: 65000/250000
training loss: 2.6217946175344 on iter: 65100/250000
training loss: 2.9861897746874 on iter: 65200/250000
training loss: 3.0961898524645 on iter: 65300/250000
training loss: 3.1137210161858 on iter: 65400/250000
training loss: 3.5197989557816 on iter: 65500/250000
training loss: 3.4491556138872 on iter: 65600/250000
training loss: 2.656666503578 on iter: 65700/250000
training loss: 2.9061804827635 on iter: 65800/250000
training loss: 3.2065071756976 on iter: 65900/250000
training loss: 3.2079962530131 on iter: 66000/250000
training loss: 3.6546202090676 on iter: 66100/250000
training loss: 2.8584147946144 on iter: 66200/250000
training loss: 3.3446772832008 on iter: 66300/250000
training loss: 2.7453075451002 on iter: 66400/250000
training loss: 2.2535545342383 on iter: 66500/250000
training loss: 2.5837350295701 on iter: 66600/250000
training loss: 2.7692851896589 on iter: 66700/250000
training loss: 3.5189569248823 on iter: 66800/250000
training loss: 3.2954797731508 on iter: 66900/250000
training loss: 2.6930909740962 on iter: 67000/250000
training loss: 3.0156785090048 on iter: 67100/250000
training loss: 2.5555769150017 on iter: 67200/250000
training loss: 2.7921351404762 on iter: 67300/250000
training loss: 2.9198624847713 on iter: 67400/250000
training loss: 2.8635953785652 on iter: 67500/250000
training loss: 2.5908949100521 on iter: 67600/250000
training loss: 2.6632553410762 on iter: 67700/250000
training loss: 3.2292279717964 on iter: 67800/250000
training loss: 3.4049451697408 on iter: 67900/250000
training loss: 3.4313322335933 on iter: 68000/250000
training loss: 2.837024118665 on iter: 68100/250000
training loss: 3.7276361556984 on iter: 68200/250000
training loss: 3.1352561948687 on iter: 68300/250000
training loss: 3.4846995113593 on iter: 68400/250000
training loss: 3.1400407939217 on iter: 68500/250000
training loss: 2.814141848301 on iter: 68600/250000
training loss: 3.3923237151852 on iter: 68700/250000
training loss: 2.5814441601296 on iter: 68800/250000
training loss: 2.6218387099068 on iter: 68900/250000
training loss: 2.4842013652746 on iter: 69000/250000
training loss: 3.0806022973119 on iter: 69100/250000
training loss: 3.0246612504796 on iter: 69200/250000
training loss: 3.2521774912744 on iter: 69300/250000
training loss: 3.0504192904689 on iter: 69400/250000
training loss: 3.1450432059011 on iter: 69500/250000
training loss: 3.4340892953938 on iter: 69600/250000
training loss: 3.1643869403745 on iter: 69700/250000
training loss: 2.9443093896721 on iter: 69800/250000
training loss: 3.1639822257506 on iter: 69900/250000
training loss: 3.5239903307213 on iter: 70000/250000
training loss: 3.1683805187573 on iter: 70100/250000
training loss: 3.2738489613018 on iter: 70200/250000
training loss: 3.3061647799341 on iter: 70300/250000
training loss: 2.3301140115698 on iter: 70400/250000
training loss: 2.7353810700005 on iter: 70500/250000
training loss: 2.9502778672276 on iter: 70600/250000
training loss: 2.9377032426871 on iter: 70700/250000
training loss: 2.9268029428606 on iter: 70800/250000
training loss: 3.4432488077333 on iter: 70900/250000
training loss: 2.9698552179863 on iter: 71000/250000
training loss: 2.8019616871898 on iter: 71100/250000
training loss: 3.0236223960961 on iter: 71200/250000
training loss: 2.7803275072981 on iter: 71300/250000
training loss: 3.148320343629 on iter: 71400/250000
training loss: 3.1084766539006 on iter: 71500/250000
training loss: 3.15534455513 on iter: 71600/250000
training loss: 2.5522471391442 on iter: 71700/250000
training loss: 2.8775859991267 on iter: 71800/250000
training loss: 2.8590626964966 on iter: 71900/250000
training loss: 2.3074517110586 on iter: 72000/250000
training loss: 3.0126312298921 on iter: 72100/250000
training loss: 2.7601602534217 on iter: 72200/250000
training loss: 2.7035118035199 on iter: 72300/250000
training loss: 2.7017113795799 on iter: 72400/250000
training loss: 3.1857272839073 on iter: 72500/250000
training loss: 3.8198328028016 on iter: 72600/250000
training loss: 2.6088185608655 on iter: 72700/250000
training loss: 3.3609472960673 on iter: 72800/250000
training loss: 2.9430997383024 on iter: 72900/250000
training loss: 3.5883307679762 on iter: 73000/250000
training loss: 2.5461844835487 on iter: 73100/250000
training loss: 2.8102957137863 on iter: 73200/250000
training loss: 3.151489498988 on iter: 73300/250000
training loss: 2.9181542741557 on iter: 73400/250000
training loss: 2.7003927641265 on iter: 73500/250000
training loss: 3.0682985007124 on iter: 73600/250000
training loss: 3.8631146414241 on iter: 73700/250000
training loss: 3.0694221511307 on iter: 73800/250000
training loss: 3.7196090830349 on iter: 73900/250000
training loss: 3.5014825697785 on iter: 74000/250000
training loss: 3.586143173477 on iter: 74100/250000
training loss: 3.3340739721714 on iter: 74200/250000
training loss: 2.4551394150771 on iter: 74300/250000
training loss: 3.3608672482843 on iter: 74400/250000
training loss: 3.0450554453591 on iter: 74500/250000
training loss: 2.8018784310776 on iter: 74600/250000
training loss: 2.6398486372899 on iter: 74700/250000
training loss: 3.1180912152079 on iter: 74800/250000
training loss: 2.7976385471867 on iter: 74900/250000
training loss: 2.7986637907159 on iter: 75000/250000
training loss: 3.0670404750322 on iter: 75100/250000
training loss: 2.5600503187674 on iter: 75200/250000
training loss: 2.7130951573335 on iter: 75300/250000
training loss: 2.6623422271355 on iter: 75400/250000
training loss: 2.8298775052059 on iter: 75500/250000
training loss: 2.3036252644213 on iter: 75600/250000
training loss: 2.9201704123843 on iter: 75700/250000
training loss: 3.2203663624175 on iter: 75800/250000
training loss: 3.7863838160382 on iter: 75900/250000
training loss: 2.9113050473782 on iter: 76000/250000
training loss: 2.9761322221183 on iter: 76100/250000
training loss: 3.8269345676029 on iter: 76200/250000
training loss: 3.4922256275946 on iter: 76300/250000
training loss: 2.7214598481621 on iter: 76400/250000
training loss: 3.4959436263141 on iter: 76500/250000
training loss: 2.8614734132119 on iter: 76600/250000
training loss: 2.9568758802305 on iter: 76700/250000
training loss: 3.105182718802 on iter: 76800/250000
training loss: 2.7582200695045 on iter: 76900/250000
training loss: 2.9178767757736 on iter: 77000/250000
training loss: 3.1624818800075 on iter: 77100/250000
training loss: 3.3224185661786 on iter: 77200/250000
training loss: 3.0615292418194 on iter: 77300/250000
training loss: 2.9605002521179 on iter: 77400/250000
training loss: 3.300348118564 on iter: 77500/250000
training loss: 3.4672443304345 on iter: 77600/250000
training loss: 3.1136209081618 on iter: 77700/250000
training loss: 3.4035072276341 on iter: 77800/250000
training loss: 2.4240017831044 on iter: 77900/250000
training loss: 2.4582549347919 on iter: 78000/250000
training loss: 3.4908742560878 on iter: 78100/250000
training loss: 2.683143147409 on iter: 78200/250000
training loss: 2.6585905098064 on iter: 78300/250000
training loss: 2.6069290228319 on iter: 78400/250000
training loss: 4.0959419035624 on iter: 78500/250000
training loss: 3.5338250957031 on iter: 78600/250000
training loss: 3.3091526291158 on iter: 78700/250000
training loss: 2.9204612692259 on iter: 78800/250000
training loss: 3.1874143213861 on iter: 78900/250000
training loss: 3.2462258256699 on iter: 79000/250000
training loss: 3.3591181843077 on iter: 79100/250000
training loss: 2.6262738998995 on iter: 79200/250000
training loss: 2.2606989460131 on iter: 79300/250000
training loss: 3.2887546648892 on iter: 79400/250000
training loss: 3.1768169501087 on iter: 79500/250000
training loss: 3.198205647506 on iter: 79600/250000
training loss: 2.2238094488775 on iter: 79700/250000
training loss: 3.2334920867553 on iter: 79800/250000
training loss: 3.2942414178256 on iter: 79900/250000
training loss: 3.6058109288632 on iter: 80000/250000
training loss: 3.2230784054313 on iter: 80100/250000
training loss: 3.2027173630003 on iter: 80200/250000
training loss: 3.2214875897907 on iter: 80300/250000
training loss: 4.2101090875936 on iter: 80400/250000
training loss: 3.3548407887723 on iter: 80500/250000
training loss: 2.4418385572029 on iter: 80600/250000
training loss: 2.8578159210275 on iter: 80700/250000
training loss: 2.7935679163327 on iter: 80800/250000
training loss: 2.8220729725677 on iter: 80900/250000
training loss: 3.1391517692944 on iter: 81000/250000
training loss: 1.9498078732668 on iter: 81100/250000
training loss: 3.654250005333 on iter: 81200/250000
training loss: 2.9501578020433 on iter: 81300/250000
training loss: 2.7929608677126 on iter: 81400/250000
training loss: 3.2939290028892 on iter: 81500/250000
training loss: 3.4680070501235 on iter: 81600/250000
training loss: 3.3280540667712 on iter: 81700/250000
training loss: 3.0938602474298 on iter: 81800/250000
training loss: 3.5073672352356 on iter: 81900/250000
training loss: 2.7900153641872 on iter: 82000/250000
training loss: 3.4405861323533 on iter: 82100/250000
training loss: 3.1700688951535 on iter: 82200/250000
training loss: 3.4075003273354 on iter: 82300/250000
training loss: 2.5209620019539 on iter: 82400/250000
training loss: 3.4567301732889 on iter: 82500/250000
training loss: 2.4024020675504 on iter: 82600/250000
training loss: 3.3135311361647 on iter: 82700/250000
training loss: 2.5500747732108 on iter: 82800/250000
training loss: 2.6894272315724 on iter: 82900/250000
training loss: 3.3764500482183 on iter: 83000/250000
training loss: 2.8372154205917 on iter: 83100/250000
training loss: 2.3417127032018 on iter: 83200/250000
training loss: 2.8799256959584 on iter: 83300/250000
training loss: 2.9360904677274 on iter: 83400/250000
training loss: 3.4810211370366 on iter: 83500/250000
training loss: 2.8113289115831 on iter: 83600/250000
training loss: 3.4172887636923 on iter: 83700/250000
training loss: 3.2389820104909 on iter: 83800/250000
training loss: 3.1292901858822 on iter: 83900/250000
training loss: 3.9974648098504 on iter: 84000/250000
training loss: 2.6772020756195 on iter: 84100/250000
training loss: 3.129729830858 on iter: 84200/250000
training loss: 2.9587852440105 on iter: 84300/250000
training loss: 3.4128124443371 on iter: 84400/250000
training loss: 2.395707618862 on iter: 84500/250000
training loss: 3.1472676669072 on iter: 84600/250000
training loss: 3.1195219232048 on iter: 84700/250000
training loss: 2.511016136624 on iter: 84800/250000
training loss: 3.3961748996283 on iter: 84900/250000
training loss: 3.1927222401296 on iter: 85000/250000
training loss: 3.4691962321855 on iter: 85100/250000
training loss: 4.3291287482367 on iter: 85200/250000
training loss: 3.0518685633107 on iter: 85300/250000
training loss: 3.1125989369108 on iter: 85400/250000
training loss: 2.7204273772526 on iter: 85500/250000
training loss: 4.1785111407049 on iter: 85600/250000
training loss: 3.0994636893797 on iter: 85700/250000
training loss: 3.0482969639376 on iter: 85800/250000
training loss: 3.1148111293486 on iter: 85900/250000
training loss: 3.0078082013296 on iter: 86000/250000
training loss: 3.3719060015752 on iter: 86100/250000
training loss: 3.2692618160372 on iter: 86200/250000
training loss: 2.2311862267969 on iter: 86300/250000
training loss: 3.1589019288189 on iter: 86400/250000
training loss: 3.4220873491725 on iter: 86500/250000
training loss: 2.4769188498658 on iter: 86600/250000
training loss: 3.251647972133 on iter: 86700/250000
training loss: 2.8501730099248 on iter: 86800/250000
training loss: 2.9203825061127 on iter: 86900/250000
training loss: 2.7600926773579 on iter: 87000/250000
training loss: 3.0718034821993 on iter: 87100/250000
training loss: 3.7511493316866 on iter: 87200/250000
training loss: 2.9744798411961 on iter: 87300/250000
training loss: 3.7476226584925 on iter: 87400/250000
training loss: 2.425198630754 on iter: 87500/250000
training loss: 2.9974591421726 on iter: 87600/250000
training loss: 2.8944863264225 on iter: 87700/250000
training loss: 2.2629894952281 on iter: 87800/250000
training loss: 3.6786404998716 on iter: 87900/250000
training loss: 3.5475862504115 on iter: 88000/250000
training loss: 3.4926289871406 on iter: 88100/250000
training loss: 2.7232116427345 on iter: 88200/250000
training loss: 3.8909667089994 on iter: 88300/250000
training loss: 2.30766226088 on iter: 88400/250000
training loss: 3.0133614748243 on iter: 88500/250000
training loss: 3.2156360421549 on iter: 88600/250000
training loss: 3.5697917927919 on iter: 88700/250000
training loss: 2.3174387334497 on iter: 88800/250000
training loss: 3.2643335358689 on iter: 88900/250000
training loss: 2.9133758762182 on iter: 89000/250000
training loss: 2.8184668001381 on iter: 89100/250000
training loss: 3.1268242135431 on iter: 89200/250000
training loss: 3.5708802871002 on iter: 89300/250000
training loss: 3.5051706604978 on iter: 89400/250000
training loss: 3.2994268299171 on iter: 89500/250000
training loss: 3.1269773547202 on iter: 89600/250000
training loss: 3.2506228038989 on iter: 89700/250000
training loss: 3.8122969970805 on iter: 89800/250000
training loss: 2.9197335341476 on iter: 89900/250000
training loss: 3.5898873465625 on iter: 90000/250000
training loss: 2.9288423034565 on iter: 90100/250000
training loss: 3.1318015021499 on iter: 90200/250000
training loss: 3.3519926644496 on iter: 90300/250000
training loss: 3.0221232849678 on iter: 90400/250000
training loss: 2.9306712604931 on iter: 90500/250000
training loss: 3.5486007142779 on iter: 90600/250000
training loss: 3.2471867135975 on iter: 90700/250000
training loss: 3.1211265111915 on iter: 90800/250000
training loss: 3.0473616105829 on iter: 90900/250000
training loss: 2.5899780232682 on iter: 91000/250000
training loss: 3.1634816056443 on iter: 91100/250000
training loss: 3.5856914015124 on iter: 91200/250000
training loss: 3.2563798281326 on iter: 91300/250000
training loss: 2.7048318722128 on iter: 91400/250000
training loss: 3.1053316427279 on iter: 91500/250000
training loss: 3.0435794736599 on iter: 91600/250000
training loss: 3.1275282116733 on iter: 91700/250000
training loss: 3.1013320098796 on iter: 91800/250000
training loss: 3.650391512821 on iter: 91900/250000
training loss: 3.063035893383 on iter: 92000/250000
training loss: 2.3581747701905 on iter: 92100/250000
training loss: 3.7426702592304 on iter: 92200/250000
training loss: 3.6779278999341 on iter: 92300/250000
training loss: 3.4739542149877 on iter: 92400/250000
training loss: 3.0219578748349 on iter: 92500/250000
training loss: 4.0953094589712 on iter: 92600/250000
training loss: 3.6325675677517 on iter: 92700/250000
training loss: 2.9792486169724 on iter: 92800/250000
training loss: 3.4663189364315 on iter: 92900/250000
training loss: 3.1925643293477 on iter: 93000/250000
training loss: 2.8440017367545 on iter: 93100/250000
training loss: 3.5428949394661 on iter: 93200/250000
training loss: 2.2840789227632 on iter: 93300/250000
training loss: 3.2028367629199 on iter: 93400/250000
training loss: 2.8658498947016 on iter: 93500/250000
training loss: 3.2839366072242 on iter: 93600/250000
training loss: 3.2930808529797 on iter: 93700/250000
training loss: 2.9412860908112 on iter: 93800/250000
training loss: 3.7553009812439 on iter: 93900/250000
training loss: 3.4040483160218 on iter: 94000/250000
training loss: 3.2638199318515 on iter: 94100/250000
training loss: 2.9790121323035 on iter: 94200/250000
training loss: 3.2701793708365 on iter: 94300/250000
training loss: 3.1168219086549 on iter: 94400/250000
training loss: 3.1477984545009 on iter: 94500/250000
training loss: 2.6505683483284 on iter: 94600/250000
training loss: 2.8730924269438 on iter: 94700/250000
training loss: 3.7728040242791 on iter: 94800/250000
training loss: 3.4050009655241 on iter: 94900/250000
training loss: 3.3073012549963 on iter: 95000/250000
training loss: 2.8045395519037 on iter: 95100/250000
training loss: 3.5551399201337 on iter: 95200/250000
training loss: 3.356406596234 on iter: 95300/250000
training loss: 3.3134647733408 on iter: 95400/250000
training loss: 2.8284596124776 on iter: 95500/250000
training loss: 2.0983338922516 on iter: 95600/250000
training loss: 3.7266426529208 on iter: 95700/250000
training loss: 3.0584853765345 on iter: 95800/250000
training loss: 2.6709728068981 on iter: 95900/250000
training loss: 2.7576421956019 on iter: 96000/250000
training loss: 2.6872263011959 on iter: 96100/250000
training loss: 2.1441335647624 on iter: 96200/250000
training loss: 2.9772689995358 on iter: 96300/250000
training loss: 3.1263591854684 on iter: 96400/250000
training loss: 3.2558577746374 on iter: 96500/250000
training loss: 4.2056705094636 on iter: 96600/250000
training loss: 3.6340333024921 on iter: 96700/250000
training loss: 3.1833426867569 on iter: 96800/250000
training loss: 3.6951074246976 on iter: 96900/250000
training loss: 2.6953963010955 on iter: 97000/250000
training loss: 2.7063525933895 on iter: 97100/250000
training loss: 3.388374714191 on iter: 97200/250000
training loss: 2.7806212621649 on iter: 97300/250000
training loss: 3.0068240848142 on iter: 97400/250000
training loss: 3.1704782633865 on iter: 97500/250000
training loss: 3.1095438764923 on iter: 97600/250000
training loss: 3.0327824408005 on iter: 97700/250000
training loss: 2.6600096105185 on iter: 97800/250000
training loss: 2.8305482464535 on iter: 97900/250000
training loss: 3.501474315035 on iter: 98000/250000
training loss: 2.5059130930923 on iter: 98100/250000
training loss: 3.2790858865465 on iter: 98200/250000
training loss: 2.6877396411224 on iter: 98300/250000
training loss: 3.3024526478581 on iter: 98400/250000
training loss: 2.6391447694271 on iter: 98500/250000
training loss: 3.3747765738374 on iter: 98600/250000
training loss: 3.2434537123341 on iter: 98700/250000
training loss: 3.2348953412876 on iter: 98800/250000
training loss: 3.0172559775923 on iter: 98900/250000
training loss: 3.046895016979 on iter: 99000/250000
training loss: 4.1429749341762 on iter: 99100/250000
training loss: 2.9592374395662 on iter: 99200/250000
training loss: 3.053165618422 on iter: 99300/250000
training loss: 3.1380669949742 on iter: 99400/250000
training loss: 3.8284847755198 on iter: 99500/250000
training loss: 3.256313051897 on iter: 99600/250000
training loss: 3.0802980008282 on iter: 99700/250000
training loss: 2.7020287307575 on iter: 99800/250000
training loss: 2.5346986663035 on iter: 99900/250000
training loss: 3.5166636628935 on iter: 100000/250000
learining rate: 0.00010800255934116 training loss: 3.0327017420295 on iter: 100100/250000
training loss: 3.0043601261885 on iter: 100200/250000
training loss: 2.3483058381292 on iter: 100300/250000
training loss: 3.1369173237477 on iter: 100400/250000
training loss: 3.2270886778905 on iter: 100500/250000
training loss: 2.7546400152372 on iter: 100600/250000
training loss: 3.1854861316964 on iter: 100700/250000
training loss: 2.947400158307 on iter: 100800/250000
training loss: 2.4029443024843 on iter: 100900/250000
training loss: 2.8576742350921 on iter: 101000/250000
training loss: 2.76252542774 on iter: 101100/250000
training loss: 3.3523808885525 on iter: 101200/250000
training loss: 3.067249273005 on iter: 101300/250000
training loss: 3.0163809004738 on iter: 101400/250000
training loss: 3.155239220898 on iter: 101500/250000
training loss: 2.6385179335559 on iter: 101600/250000
training loss: 3.3343382314907 on iter: 101700/250000
training loss: 3.1990294595618 on iter: 101800/250000
training loss: 2.7049420329027 on iter: 101900/250000
training loss: 3.2461680463507 on iter: 102000/250000
training loss: 2.1099336758842 on iter: 102100/250000
training loss: 3.1788909131081 on iter: 102200/250000
training loss: 2.8964299482969 on iter: 102300/250000
training loss: 3.1377651536462 on iter: 102400/250000
training loss: 2.6943582893916 on iter: 102500/250000
training loss: 3.300301879228 on iter: 102600/250000
training loss: 3.4158104287315 on iter: 102700/250000
training loss: 3.104468719876 on iter: 102800/250000
training loss: 3.1090663011995 on iter: 102900/250000
training loss: 3.0552151874723 on iter: 103000/250000
training loss: 3.1430850909705 on iter: 103100/250000
training loss: 3.107005321394 on iter: 103200/250000
training loss: 2.4408520316678 on iter: 103300/250000
training loss: 2.8610589294825 on iter: 103400/250000
training loss: 2.9809905670188 on iter: 103500/250000
training loss: 2.6914294339634 on iter: 103600/250000
training loss: 3.3739967365406 on iter: 103700/250000
training loss: 2.6266585611307 on iter: 103800/250000
training loss: 3.2831438093502 on iter: 103900/250000
training loss: 2.8985822828502 on iter: 104000/250000
training loss: 3.0407285237174 on iter: 104100/250000
training loss: 2.7260768055986 on iter: 104200/250000
training loss: 3.5789948914445 on iter: 104300/250000
training loss: 3.9153331602912 on iter: 104400/250000
training loss: 2.630021062834 on iter: 104500/250000
training loss: 3.2687744056259 on iter: 104600/250000
training loss: 3.2658529999358 on iter: 104700/250000
training loss: 3.8057415731281 on iter: 104800/250000
training loss: 3.4557269457401 on iter: 104900/250000
training loss: 3.8667059853996 on iter: 105000/250000
training loss: 3.5945842063405 on iter: 105100/250000
training loss: 3.0547672694087 on iter: 105200/250000
training loss: 3.9481517606064 on iter: 105300/250000
training loss: 2.5511658920515 on iter: 105400/250000
training loss: 2.8418605460209 on iter: 105500/250000
training loss: 2.9229798378816 on iter: 105600/250000
training loss: 2.974029656887 on iter: 105700/250000
training loss: 2.98535423367 on iter: 105800/250000
training loss: 2.479477220839 on iter: 105900/250000
training loss: 3.4901579850471 on iter: 106000/250000
training loss: 2.579121955898 on iter: 106100/250000
training loss: 3.3435759833006 on iter: 106200/250000
training loss: 3.5600289379082 on iter: 106300/250000
training loss: 3.8907775081097 on iter: 106400/250000
training loss: 3.6427952302829 on iter: 106500/250000
training loss: 2.5540324270303 on iter: 106600/250000
training loss: 3.2173950364349 on iter: 106700/250000
training loss: 2.9645441634375 on iter: 106800/250000
training loss: 2.8741509793922 on iter: 106900/250000
training loss: 2.968563292478 on iter: 107000/250000
training loss: 3.5502837444237 on iter: 107100/250000
training loss: 3.6995200669572 on iter: 107200/250000
training loss: 2.6925393016283 on iter: 107300/250000
training loss: 2.8479379785926 on iter: 107400/250000
training loss: 3.9507601994338 on iter: 107500/250000
training loss: 2.8907833741665 on iter: 107600/250000
training loss: 3.1137478169957 on iter: 107700/250000
training loss: 3.3648006089113 on iter: 107800/250000
training loss: 3.3243716764326 on iter: 107900/250000
training loss: 3.3900200463524 on iter: 108000/250000
training loss: 2.6656432812315 on iter: 108100/250000
training loss: 2.8468445800312 on iter: 108200/250000
training loss: 2.234889786045 on iter: 108300/250000
training loss: 3.5025901886395 on iter: 108400/250000
training loss: 2.9095283700532 on iter: 108500/250000
training loss: 3.3467881966215 on iter: 108600/250000
training loss: 3.7923696172378 on iter: 108700/250000
training loss: 2.9379007433452 on iter: 108800/250000
training loss: 2.6508456317231 on iter: 108900/250000
training loss: 3.3571601232453 on iter: 109000/250000
training loss: 3.1177949598552 on iter: 109100/250000
training loss: 2.9646612498049 on iter: 109200/250000
training loss: 3.1464471644156 on iter: 109300/250000
training loss: 3.727860828138 on iter: 109400/250000
training loss: 2.7205805393364 on iter: 109500/250000
training loss: 4.0124013875446 on iter: 109600/250000
training loss: 2.7342922441869 on iter: 109700/250000
training loss: 2.9271065676206 on iter: 109800/250000
training loss: 3.3331696738018 on iter: 109900/250000
training loss: 3.2603957803673 on iter: 110000/250000
training loss: 3.2160846988682 on iter: 110100/250000
training loss: 3.9017083877573 on iter: 110200/250000
training loss: 3.0941670413231 on iter: 110300/250000
training loss: 3.6408876193108 on iter: 110400/250000
training loss: 2.6621053790959 on iter: 110500/250000
training loss: 3.7872072734986 on iter: 110600/250000
training loss: 2.5068279633685 on iter: 110700/250000
training loss: 3.4644566300707 on iter: 110800/250000
training loss: 2.8488684658305 on iter: 110900/250000
training loss: 3.0461562885322 on iter: 111000/250000
training loss: 3.0728875486956 on iter: 111100/250000
training loss: 3.3074915656514 on iter: 111200/250000
training loss: 3.3149377763177 on iter: 111300/250000
training loss: 2.3290811353618 on iter: 111400/250000
training loss: 2.6550002238273 on iter: 111500/250000
training loss: 2.5894566624914 on iter: 111600/250000
training loss: 2.7848947419213 on iter: 111700/250000
training loss: 2.710791002836 on iter: 111800/250000
training loss: 4.3098872676428 on iter: 111900/250000
training loss: 2.539215419408 on iter: 112000/250000
training loss: 3.3759674520408 on iter: 112100/250000
training loss: 3.552321277245 on iter: 112200/250000
training loss: 3.9893470558421 on iter: 112300/250000
training loss: 3.725465338495 on iter: 112400/250000
training loss: 2.5017534661636 on iter: 112500/250000
training loss: 2.831314149377 on iter: 112600/250000
training loss: 3.4464632786046 on iter: 112700/250000
training loss: 3.8647629961771 on iter: 112800/250000
training loss: 3.396715273239 on iter: 112900/250000
training loss: 2.7266483627859 on iter: 113000/250000
training loss: 3.7101811117513 on iter: 113100/250000
training loss: 2.7636866595254 on iter: 113200/250000
training loss: 2.7900540937878 on iter: 113300/250000
training loss: 3.4053493105334 on iter: 113400/250000
training loss: 2.8959985977739 on iter: 113500/250000
training loss: 3.7400952471001 on iter: 113600/250000
training loss: 3.1881377281127 on iter: 113700/250000
training loss: 2.6330891487682 on iter: 113800/250000
training loss: 3.3171659779071 on iter: 113900/250000
training loss: 2.7993992687263 on iter: 114000/250000
training loss: 3.5661972859175 on iter: 114100/250000
training loss: 2.9114864932412 on iter: 114200/250000
training loss: 3.2800801564738 on iter: 114300/250000
training loss: 3.5601675371408 on iter: 114400/250000
training loss: 4.2804892521342 on iter: 114500/250000
training loss: 3.2977155047946 on iter: 114600/250000
training loss: 3.4957948516788 on iter: 114700/250000
training loss: 2.5563933136423 on iter: 114800/250000
training loss: 2.9728177094041 on iter: 114900/250000
training loss: 3.1738382023377 on iter: 115000/250000
training loss: 3.175994371068 on iter: 115100/250000
training loss: 2.8979501048974 on iter: 115200/250000
training loss: 3.1710610605401 on iter: 115300/250000
training loss: 3.3544019353752 on iter: 115400/250000
training loss: 2.7309667200307 on iter: 115500/250000
training loss: 3.6437426537497 on iter: 115600/250000
training loss: 2.671868165741 on iter: 115700/250000
training loss: 3.6805832552689 on iter: 115800/250000
training loss: 3.0334910349853 on iter: 115900/250000
training loss: 2.9602760197548 on iter: 116000/250000
training loss: 2.8171066877077 on iter: 116100/250000
training loss: 3.4313771606891 on iter: 116200/250000
training loss: 3.0808029882413 on iter: 116300/250000
training loss: 4.1728280545765 on iter: 116400/250000
training loss: 2.7881171982372 on iter: 116500/250000
training loss: 4.3308932273722 on iter: 116600/250000
training loss: 3.1602761559935 on iter: 116700/250000
training loss: 2.7502681287373 on iter: 116800/250000
training loss: 3.4654715435911 on iter: 116900/250000
training loss: 3.209088821843 on iter: 117000/250000
training loss: 3.4075702876735 on iter: 117100/250000
training loss: 3.1278112512196 on iter: 117200/250000
training loss: 3.1547824844886 on iter: 117300/250000
training loss: 3.139485246576 on iter: 117400/250000
training loss: 2.6224858415888 on iter: 117500/250000
training loss: 2.2517658332248 on iter: 117600/250000
training loss: 3.2536973531695 on iter: 117700/250000
training loss: 3.2436183920314 on iter: 117800/250000
training loss: 3.471084366721 on iter: 117900/250000
training loss: 2.8443562802975 on iter: 118000/250000
training loss: 3.0285704528599 on iter: 118100/250000
training loss: 2.974299099301 on iter: 118200/250000
training loss: 3.4206898539348 on iter: 118300/250000
training loss: 3.4367176987295 on iter: 118400/250000
training loss: 2.4287144435125 on iter: 118500/250000
training loss: 3.5051093348717 on iter: 118600/250000
training loss: 3.0936920240242 on iter: 118700/250000
training loss: 3.6334337856062 on iter: 118800/250000
training loss: 3.0211692506861 on iter: 118900/250000
training loss: 3.7424445906015 on iter: 119000/250000
training loss: 2.7867023310022 on iter: 119100/250000
training loss: 3.5849450493849 on iter: 119200/250000
training loss: 3.5192056421308 on iter: 119300/250000
training loss: 3.6312604778398 on iter: 119400/250000
training loss: 3.5396104616116 on iter: 119500/250000
training loss: 3.3336648110385 on iter: 119600/250000
training loss: 3.8616491588769 on iter: 119700/250000
training loss: 3.0951531523498 on iter: 119800/250000
training loss: 3.3768835142452 on iter: 119900/250000
training loss: 3.3174370313643 on iter: 120000/250000
training loss: 3.2293691798106 on iter: 120100/250000
training loss: 3.0411614023024 on iter: 120200/250000
training loss: 3.189015453773 on iter: 120300/250000
training loss: 2.5612232206446 on iter: 120400/250000
training loss: 2.6045983757342 on iter: 120500/250000
training loss: 3.8470014734743 on iter: 120600/250000
training loss: 3.2870030646995 on iter: 120700/250000
training loss: 3.8434835293914 on iter: 120800/250000
training loss: 3.1579680845632 on iter: 120900/250000
training loss: 3.1842635354748 on iter: 121000/250000
training loss: 3.5211043013247 on iter: 121100/250000
training loss: 3.7980213227086 on iter: 121200/250000
training loss: 3.3544774642649 on iter: 121300/250000
training loss: 2.3993067332773 on iter: 121400/250000
training loss: 3.0105034575557 on iter: 121500/250000
training loss: 2.412372100314 on iter: 121600/250000
training loss: 3.245057538935 on iter: 121700/250000
training loss: 3.071393729731 on iter: 121800/250000
training loss: 3.1755162557313 on iter: 121900/250000
training loss: 3.0874250719724 on iter: 122000/250000
training loss: 3.0643962406869 on iter: 122100/250000
training loss: 3.7085187792358 on iter: 122200/250000
training loss: 3.6646783257726 on iter: 122300/250000
training loss: 3.4542047718727 on iter: 122400/250000
training loss: 3.2804198223328 on iter: 122500/250000
training loss: 3.8656796903498 on iter: 122600/250000
training loss: 2.7001151515891 on iter: 122700/250000
training loss: 2.8394461835667 on iter: 122800/250000
training loss: 2.0427604644739 on iter: 122900/250000
training loss: 3.3410999471955 on iter: 123000/250000
training loss: 3.7929102307236 on iter: 123100/250000
training loss: 4.3448799497018 on iter: 123200/250000
training loss: 3.8625554735273 on iter: 123300/250000
training loss: 2.9390969452974 on iter: 123400/250000
training loss: 3.0657122509523 on iter: 123500/250000
training loss: 3.0676532421766 on iter: 123600/250000
training loss: 3.3349648658899 on iter: 123700/250000
training loss: 3.2312185321344 on iter: 123800/250000
training loss: 3.4640070599703 on iter: 123900/250000
training loss: 2.8967203470916 on iter: 124000/250000
training loss: 2.8835648978897 on iter: 124100/250000
training loss: 4.0374745514961 on iter: 124200/250000
training loss: 3.0657412780713 on iter: 124300/250000
training loss: 3.3077707283838 on iter: 124400/250000
training loss: 2.9806886392577 on iter: 124500/250000
training loss: 3.5263765798188 on iter: 124600/250000
training loss: 2.854285103727 on iter: 124700/250000
training loss: 3.3138572839175 on iter: 124800/250000
training loss: 2.8026803573502 on iter: 124900/250000
training loss: 2.6206723035756 on iter: 125000/250000
training loss: 3.2435108520089 on iter: 125100/250000
training loss: 3.1093118093974 on iter: 125200/250000
training loss: 2.9005788224828 on iter: 125300/250000
training loss: 3.656490588891 on iter: 125400/250000
training loss: 3.7016895276885 on iter: 125500/250000
training loss: 3.9865265586336 on iter: 125600/250000
training loss: 3.2651060585369 on iter: 125700/250000
training loss: 3.0308002324835 on iter: 125800/250000
training loss: 3.5403102052032 on iter: 125900/250000
training loss: 2.7821402850176 on iter: 126000/250000
training loss: 3.4626473096686 on iter: 126100/250000
training loss: 4.0625707351992 on iter: 126200/250000
training loss: 3.7021121813473 on iter: 126300/250000
training loss: 3.5333637125547 on iter: 126400/250000
training loss: 3.7265484082772 on iter: 126500/250000
training loss: 3.0229574140024 on iter: 126600/250000
training loss: 3.0197777952738 on iter: 126700/250000
training loss: 3.1367627609358 on iter: 126800/250000
training loss: 3.0096065505111 on iter: 126900/250000
training loss: 3.3737551371075 on iter: 127000/250000
training loss: 2.9019613417664 on iter: 127100/250000
training loss: 3.6494029929292 on iter: 127200/250000
training loss: 2.7846891664753 on iter: 127300/250000
training loss: 2.7026986416239 on iter: 127400/250000
training loss: 2.2921263862411 on iter: 127500/250000
training loss: 4.5822965019221 on iter: 127600/250000
training loss: 3.3914127375068 on iter: 127700/250000
training loss: 4.3274187931269 on iter: 127800/250000
training loss: 2.7970406287871 on iter: 127900/250000
training loss: 3.3927563432748 on iter: 128000/250000
training loss: 2.9279718493193 on iter: 128100/250000
training loss: 2.8524965411727 on iter: 128200/250000
training loss: 2.9311737382056 on iter: 128300/250000
training loss: 2.7557931046529 on iter: 128400/250000
training loss: 2.6373191133639 on iter: 128500/250000
training loss: 3.7355548711346 on iter: 128600/250000
training loss: 3.1868085651649 on iter: 128700/250000
training loss: 3.7110740265008 on iter: 128800/250000
training loss: 3.0261776113164 on iter: 128900/250000
training loss: 3.0346132039891 on iter: 129000/250000
training loss: 3.7432630073981 on iter: 129100/250000
training loss: 3.5060368308344 on iter: 129200/250000
training loss: 3.2192135990116 on iter: 129300/250000
training loss: 3.7075491474819 on iter: 129400/250000
training loss: 3.572578373616 on iter: 129500/250000
training loss: 2.5093222556805 on iter: 129600/250000
training loss: 3.6901205130724 on iter: 129700/250000
training loss: 3.4463394520436 on iter: 129800/250000
training loss: 3.3619943797611 on iter: 129900/250000
training loss: 3.251638675758 on iter: 130000/250000
training loss: 3.3855235267567 on iter: 130100/250000
training loss: 3.1340644647847 on iter: 130200/250000
training loss: 2.984781031019 on iter: 130300/250000
training loss: 3.1363475816719 on iter: 130400/250000
training loss: 3.2370922427589 on iter: 130500/250000
training loss: 3.0089355262775 on iter: 130600/250000
training loss: 2.76332219534 on iter: 130700/250000
training loss: 3.5962168305545 on iter: 130800/250000
training loss: 3.1655568831983 on iter: 130900/250000
training loss: 3.2595483582315 on iter: 131000/250000
training loss: 3.0061282343243 on iter: 131100/250000
training loss: 3.4982263922065 on iter: 131200/250000
training loss: 2.938896876162 on iter: 131300/250000
training loss: 2.7406563713496 on iter: 131400/250000
training loss: 3.0690331538141 on iter: 131500/250000
training loss: 2.9989883301511 on iter: 131600/250000
training loss: 3.5227650981593 on iter: 131700/250000
training loss: 3.3092560558557 on iter: 131800/250000
training loss: 2.7812300760032 on iter: 131900/250000
training loss: 3.382802555913 on iter: 132000/250000
training loss: 2.4863394125521 on iter: 132100/250000
training loss: 3.3825332669598 on iter: 132200/250000
training loss: 3.3384780585014 on iter: 132300/250000
training loss: 3.225951805511 on iter: 132400/250000
training loss: 3.8456689669259 on iter: 132500/250000
training loss: 2.7006159980568 on iter: 132600/250000
training loss: 2.9030357203861 on iter: 132700/250000
training loss: 2.8348710446756 on iter: 132800/250000
training loss: 3.7420715627668 on iter: 132900/250000
training loss: 2.6818271641324 on iter: 133000/250000
training loss: 2.7335176612945 on iter: 133100/250000
training loss: 3.4760274661864 on iter: 133200/250000
training loss: 4.297030414639 on iter: 133300/250000
training loss: 3.9040222039156 on iter: 133400/250000
training loss: 3.457167754976 on iter: 133500/250000
training loss: 2.273808055569 on iter: 133600/250000
training loss: 2.929923878328 on iter: 133700/250000
training loss: 3.2843511994527 on iter: 133800/250000
training loss: 3.0787945221585 on iter: 133900/250000
training loss: 2.3951106293523 on iter: 134000/250000
training loss: 3.1332972037319 on iter: 134100/250000
training loss: 3.7740280135339 on iter: 134200/250000
training loss: 2.8767530583582 on iter: 134300/250000
training loss: 2.8986970117176 on iter: 134400/250000
training loss: 3.7603853900843 on iter: 134500/250000
training loss: 3.0503293855577 on iter: 134600/250000
training loss: 4.0747587621205 on iter: 134700/250000
training loss: 2.1033811302324 on iter: 134800/250000
training loss: 3.9670318102385 on iter: 134900/250000
training loss: 2.898377254116 on iter: 135000/250000
training loss: 4.0834643885448 on iter: 135100/250000
training loss: 2.6499546521104 on iter: 135200/250000
training loss: 3.4699485890192 on iter: 135300/250000
training loss: 3.7685536807408 on iter: 135400/250000
training loss: 3.7817317951462 on iter: 135500/250000
training loss: 2.7216370499902 on iter: 135600/250000
training loss: 3.7877884003728 on iter: 135700/250000
training loss: 2.644875574328 on iter: 135800/250000
training loss: 2.9473181592156 on iter: 135900/250000
training loss: 3.1952640872438 on iter: 136000/250000
training loss: 3.0441817140406 on iter: 136100/250000
training loss: 2.8990257963323 on iter: 136200/250000
training loss: 3.2789071562923 on iter: 136300/250000
training loss: 3.2170621701291 on iter: 136400/250000
training loss: 3.2093931097987 on iter: 136500/250000
training loss: 3.4642936145097 on iter: 136600/250000
training loss: 3.8289550616868 on iter: 136700/250000
training loss: 3.1675846216891 on iter: 136800/250000
training loss: 2.6621239981858 on iter: 136900/250000
training loss: 2.9253457830618 on iter: 137000/250000
training loss: 2.7749435582343 on iter: 137100/250000
training loss: 3.1133955179094 on iter: 137200/250000
training loss: 2.701843110242 on iter: 137300/250000
training loss: 3.18229135304 on iter: 137400/250000
training loss: 3.192168657988 on iter: 137500/250000
training loss: 3.4169345237924 on iter: 137600/250000
training loss: 3.1308363997876 on iter: 137700/250000
training loss: 4.3349108975562 on iter: 137800/250000
training loss: 3.2250164964621 on iter: 137900/250000
training loss: 3.0405955423015 on iter: 138000/250000
training loss: 3.4677698572577 on iter: 138100/250000
training loss: 4.0276753233242 on iter: 138200/250000
training loss: 2.8954248101193 on iter: 138300/250000
training loss: 3.2068763445184 on iter: 138400/250000
training loss: 2.6180740622832 on iter: 138500/250000
training loss: 4.0280012324957 on iter: 138600/250000
training loss: 3.2864066604615 on iter: 138700/250000
training loss: 2.7350475138255 on iter: 138800/250000
training loss: 3.7413219220919 on iter: 138900/250000
training loss: 3.3954306605652 on iter: 139000/250000
training loss: 4.3871186006096 on iter: 139100/250000
training loss: 2.8035299312335 on iter: 139200/250000
training loss: 3.4227835012821 on iter: 139300/250000
training loss: 3.4392435118843 on iter: 139400/250000
training loss: 3.1000948792405 on iter: 139500/250000
training loss: 2.4153497145153 on iter: 139600/250000
training loss: 3.30224922308 on iter: 139700/250000
training loss: 3.6278577711431 on iter: 139800/250000
training loss: 2.8267309169494 on iter: 139900/250000
training loss: 3.4863776650989 on iter: 140000/250000
training loss: 3.38037487229 on iter: 140100/250000
training loss: 3.2481930674116 on iter: 140200/250000
training loss: 3.2655751046564 on iter: 140300/250000
training loss: 2.758453999924 on iter: 140400/250000
training loss: 2.9488923947822 on iter: 140500/250000
training loss: 2.9926604973965 on iter: 140600/250000
training loss: 3.1427456631559 on iter: 140700/250000
training loss: 3.3804450225822 on iter: 140800/250000
training loss: 3.4185912309376 on iter: 140900/250000
training loss: 3.2000879983553 on iter: 141000/250000
training loss: 3.2863742893095 on iter: 141100/250000
training loss: 2.9837140325776 on iter: 141200/250000
training loss: 2.8296475621114 on iter: 141300/250000
training loss: 3.1688849808197 on iter: 141400/250000
training loss: 3.05112531855 on iter: 141500/250000
training loss: 2.3742748877413 on iter: 141600/250000
training loss: 3.85954299821 on iter: 141700/250000
training loss: 4.0238458793738 on iter: 141800/250000
training loss: 3.9343234528301 on iter: 141900/250000
training loss: 3.3632356430867 on iter: 142000/250000
training loss: 3.2940474546221 on iter: 142100/250000
training loss: 3.6632600346174 on iter: 142200/250000
training loss: 3.9853771233897 on iter: 142300/250000
training loss: 2.7545035466454 on iter: 142400/250000
training loss: 3.3179722584262 on iter: 142500/250000
training loss: 3.7943516924842 on iter: 142600/250000
training loss: 3.1179983973593 on iter: 142700/250000
training loss: 2.4076765076489 on iter: 142800/250000
training loss: 3.3274338017246 on iter: 142900/250000
training loss: 3.7232831610194 on iter: 143000/250000
training loss: 4.5197103583005 on iter: 143100/250000
training loss: 2.8408868125225 on iter: 143200/250000
training loss: 3.185109842104 on iter: 143300/250000
training loss: 2.887862338748 on iter: 143400/250000
training loss: 3.5252361691948 on iter: 143500/250000
training loss: 3.0373416908365 on iter: 143600/250000
training loss: 3.2159435761685 on iter: 143700/250000
training loss: 3.1737491643951 on iter: 143800/250000
training loss: 3.5612016443001 on iter: 143900/250000
training loss: 3.5853935102736 on iter: 144000/250000
training loss: 3.8828512139394 on iter: 144100/250000
training loss: 2.8973377244791 on iter: 144200/250000
training loss: 3.2840904231728 on iter: 144300/250000
training loss: 3.0248808256738 on iter: 144400/250000
training loss: 3.1876154626248 on iter: 144500/250000
training loss: 3.6200120597968 on iter: 144600/250000
training loss: 2.7942218257205 on iter: 144700/250000
training loss: 2.5064167097631 on iter: 144800/250000
training loss: 4.3462181661319 on iter: 144900/250000
training loss: 3.2641078910647 on iter: 145000/250000
training loss: 3.6386033968739 on iter: 145100/250000
training loss: 4.1853365435125 on iter: 145200/250000
training loss: 3.639925872898 on iter: 145300/250000
training loss: 3.2540036523072 on iter: 145400/250000
training loss: 3.9543448940209 on iter: 145500/250000
training loss: 2.9510704586321 on iter: 145600/250000
training loss: 3.1482658748846 on iter: 145700/250000
training loss: 3.1477700058503 on iter: 145800/250000
training loss: 3.9217535966786 on iter: 145900/250000
training loss: 2.6267538402145 on iter: 146000/250000
training loss: 3.8646165105989 on iter: 146100/250000
training loss: 3.6350959210137 on iter: 146200/250000
training loss: 3.364238211062 on iter: 146300/250000
training loss: 3.040697528937 on iter: 146400/250000
training loss: 3.3298473690851 on iter: 146500/250000
training loss: 2.8269002760308 on iter: 146600/250000
training loss: 3.7106552496781 on iter: 146700/250000
training loss: 3.1129169994885 on iter: 146800/250000
training loss: 3.2478350334756 on iter: 146900/250000
training loss: 2.2820623665378 on iter: 147000/250000
training loss: 3.7007701020221 on iter: 147100/250000
training loss: 3.4116295911355 on iter: 147200/250000
training loss: 2.893104343732 on iter: 147300/250000
training loss: 2.3701725670259 on iter: 147400/250000
training loss: 3.7256456226062 on iter: 147500/250000
training loss: 3.6196005172158 on iter: 147600/250000
training loss: 3.5455702707073 on iter: 147700/250000
training loss: 3.3196209329242 on iter: 147800/250000
training loss: 2.8261463214824 on iter: 147900/250000
training loss: 3.9695398131325 on iter: 148000/250000
training loss: 2.998646989314 on iter: 148100/250000
training loss: 3.6450998444261 on iter: 148200/250000
training loss: 3.9049331628748 on iter: 148300/250000
training loss: 3.2677483419368 on iter: 148400/250000
training loss: 3.7488585607414 on iter: 148500/250000
training loss: 2.9829777390947 on iter: 148600/250000
training loss: 3.1770194663953 on iter: 148700/250000
training loss: 3.7670830653738 on iter: 148800/250000
training loss: 2.8787791407409 on iter: 148900/250000
training loss: 2.627138036501 on iter: 149000/250000
training loss: 3.104836554779 on iter: 149100/250000
training loss: 3.0977577440407 on iter: 149200/250000
training loss: 3.3098799940108 on iter: 149300/250000
training loss: 3.2018525480899 on iter: 149400/250000
training loss: 3.116799331606 on iter: 149500/250000
training loss: 3.8153986554027 on iter: 149600/250000
training loss: 2.3688675078778 on iter: 149700/250000
training loss: 2.4481639711781 on iter: 149800/250000
training loss: 3.3289611649502 on iter: 149900/250000
training loss: 2.6820942528295 on iter: 150000/250000
training loss: 3.0607831468952 on iter: 150100/250000
training loss: 3.4245134213491 on iter: 150200/250000
training loss: 3.070733898796 on iter: 150300/250000
training loss: 3.0742896637608 on iter: 150400/250000
training loss: 4.348213477912 on iter: 150500/250000
training loss: 3.3022368311199 on iter: 150600/250000
training loss: 3.1886762518295 on iter: 150700/250000
training loss: 2.4466074601156 on iter: 150800/250000
training loss: 3.3768505314082 on iter: 150900/250000
training loss: 2.9133181946238 on iter: 151000/250000
training loss: 3.7082773489831 on iter: 151100/250000
training loss: 3.5814487056636 on iter: 151200/250000
training loss: 3.9250163931697 on iter: 151300/250000
training loss: 3.3814166842502 on iter: 151400/250000
training loss: 2.4153511387573 on iter: 151500/250000
training loss: 3.5124228944053 on iter: 151600/250000
training loss: 3.5509894414346 on iter: 151700/250000
training loss: 3.4243654932208 on iter: 151800/250000
training loss: 3.7824901449955 on iter: 151900/250000
training loss: 2.70460061095 on iter: 152000/250000
training loss: 3.7908020300417 on iter: 152100/250000
training loss: 2.545228071424 on iter: 152200/250000
training loss: 3.3593872247947 on iter: 152300/250000
training loss: 3.8141430770299 on iter: 152400/250000
training loss: 3.8705829449174 on iter: 152500/250000
training loss: 3.3059693379346 on iter: 152600/250000
training loss: 4.0987307360371 on iter: 152700/250000
training loss: 2.7219546917264 on iter: 152800/250000
training loss: 4.0061671290968 on iter: 152900/250000
training loss: 3.6345666990862 on iter: 153000/250000
training loss: 2.8139562268209 on iter: 153100/250000
training loss: 2.9054604251365 on iter: 153200/250000
training loss: 3.620965411465 on iter: 153300/250000
training loss: 3.6024873657349 on iter: 153400/250000
training loss: 2.4954608864773 on iter: 153500/250000
training loss: 3.0707411933274 on iter: 153600/250000
training loss: 2.4842435453268 on iter: 153700/250000
training loss: 2.7877206537208 on iter: 153800/250000
training loss: 4.1190213688942 on iter: 153900/250000
training loss: 3.2347349583776 on iter: 154000/250000
training loss: 3.8436815221285 on iter: 154100/250000
training loss: 2.9875341582531 on iter: 154200/250000
training loss: 3.3442520441665 on iter: 154300/250000
training loss: 3.9186023322323 on iter: 154400/250000
training loss: 3.0082523587586 on iter: 154500/250000
training loss: 3.2199461629209 on iter: 154600/250000
training loss: 3.2310908078338 on iter: 154700/250000
training loss: 3.4747050016139 on iter: 154800/250000
training loss: 2.8526512927841 on iter: 154900/250000
training loss: 2.8032243269509 on iter: 155000/250000
training loss: 3.0119246179228 on iter: 155100/250000
training loss: 3.7752784206578 on iter: 155200/250000
training loss: 3.4192405194535 on iter: 155300/250000
training loss: 3.4904720629103 on iter: 155400/250000
training loss: 3.0949976228346 on iter: 155500/250000
training loss: 2.8044013361141 on iter: 155600/250000
training loss: 3.0586006634914 on iter: 155700/250000
training loss: 3.9997278538708 on iter: 155800/250000
training loss: 3.6276117177468 on iter: 155900/250000
training loss: 3.2426290639576 on iter: 156000/250000
training loss: 3.0068004835744 on iter: 156100/250000
training loss: 4.163821400467 on iter: 156200/250000
training loss: 3.0775560833438 on iter: 156300/250000
training loss: 2.6607663557621 on iter: 156400/250000
training loss: 3.2381422520835 on iter: 156500/250000
training loss: 2.816412820393 on iter: 156600/250000
training loss: 2.8189127209158 on iter: 156700/250000
training loss: 2.985648725152 on iter: 156800/250000
training loss: 3.4849730574128 on iter: 156900/250000
training loss: 3.9749392862194 on iter: 157000/250000
training loss: 3.2401466853137 on iter: 157100/250000
training loss: 3.1725075481082 on iter: 157200/250000
training loss: 3.9853007932539 on iter: 157300/250000
training loss: 3.517997745513 on iter: 157400/250000
training loss: 4.0305093833841 on iter: 157500/250000
training loss: 2.7911196392414 on iter: 157600/250000
training loss: 2.9834341193546 on iter: 157700/250000
training loss: 2.5888619570617 on iter: 157800/250000
training loss: 3.6533723786989 on iter: 157900/250000
training loss: 3.1743237581732 on iter: 158000/250000
training loss: 3.6049578552151 on iter: 158100/250000
training loss: 3.250541007491 on iter: 158200/250000
training loss: 3.333781408373 on iter: 158300/250000
training loss: 3.3622503416261 on iter: 158400/250000
training loss: 3.4893663443477 on iter: 158500/250000
training loss: 3.5309156857349 on iter: 158600/250000
training loss: 2.2732091134497 on iter: 158700/250000
training loss: 2.7563147808289 on iter: 158800/250000
training loss: 3.490922276109 on iter: 158900/250000
training loss: 4.3368294970252 on iter: 159000/250000
training loss: 2.8346021675447 on iter: 159100/250000
training loss: 3.3447018562428 on iter: 159200/250000
training loss: 2.6847476924183 on iter: 159300/250000
training loss: 3.3012538687305 on iter: 159400/250000
training loss: 3.3309668321693 on iter: 159500/250000
training loss: 2.9772996591127 on iter: 159600/250000
training loss: 3.2962380582373 on iter: 159700/250000
training loss: 3.1194597618529 on iter: 159800/250000
training loss: 3.6618129802579 on iter: 159900/250000
training loss: 3.8225467138743 on iter: 160000/250000
training loss: 4.0985198057219 on iter: 160100/250000
training loss: 2.5350789183284 on iter: 160200/250000
training loss: 3.0533641969889 on iter: 160300/250000
training loss: 3.3009849188772 on iter: 160400/250000
training loss: 3.2646745612391 on iter: 160500/250000
training loss: 3.3856301700798 on iter: 160600/250000
training loss: 2.998268409486 on iter: 160700/250000
training loss: 4.2683353310906 on iter: 160800/250000
training loss: 2.9729091651401 on iter: 160900/250000
training loss: 3.5172062394297 on iter: 161000/250000
training loss: 2.6384827150942 on iter: 161100/250000
training loss: 3.3035328488424 on iter: 161200/250000
training loss: 3.7958870083835 on iter: 161300/250000
training loss: 3.191240660161 on iter: 161400/250000
training loss: 2.4005274480983 on iter: 161500/250000
training loss: 4.1488975312781 on iter: 161600/250000
training loss: 3.7946943817939 on iter: 161700/250000
training loss: 3.1830626854993 on iter: 161800/250000
training loss: 3.8508233094907 on iter: 161900/250000
training loss: 3.6254783239133 on iter: 162000/250000
training loss: 3.1602406190305 on iter: 162100/250000
training loss: 3.0620143808168 on iter: 162200/250000
training loss: 2.5085226316309 on iter: 162300/250000
training loss: 3.0063563873249 on iter: 162400/250000
training loss: 3.5135029658332 on iter: 162500/250000
training loss: 2.5423774991698 on iter: 162600/250000
training loss: 3.079401041362 on iter: 162700/250000
training loss: 3.4978784504222 on iter: 162800/250000
training loss: 2.6464151076187 on iter: 162900/250000
training loss: 3.1943084930769 on iter: 163000/250000
training loss: 2.492030760821 on iter: 163100/250000
training loss: 4.0659781837459 on iter: 163200/250000
training loss: 3.0007462549705 on iter: 163300/250000
training loss: 3.2024904233061 on iter: 163400/250000
training loss: 3.3256251274194 on iter: 163500/250000
training loss: 3.7234361036211 on iter: 163600/250000
training loss: 3.0087493730408 on iter: 163700/250000
training loss: 3.2873184054482 on iter: 163800/250000
training loss: 3.6798941581929 on iter: 163900/250000
training loss: 3.2799748850845 on iter: 164000/250000
training loss: 3.0901756290036 on iter: 164100/250000
training loss: 2.7977905779821 on iter: 164200/250000
training loss: 3.8320074443035 on iter: 164300/250000
training loss: 2.877100461522 on iter: 164400/250000
training loss: 2.8595031117602 on iter: 164500/250000
training loss: 3.5470190719298 on iter: 164600/250000
training loss: 3.1801239244905 on iter: 164700/250000
training loss: 3.0396888129822 on iter: 164800/250000
training loss: 3.6258320784952 on iter: 164900/250000
training loss: 3.2296592563881 on iter: 165000/250000
training loss: 3.9266565813146 on iter: 165100/250000
training loss: 4.5428740948527 on iter: 165200/250000
training loss: 3.8384737636263 on iter: 165300/250000
training loss: 3.9163358978196 on iter: 165400/250000
training loss: 3.7356552323153 on iter: 165500/250000
training loss: 3.3738775416571 on iter: 165600/250000
training loss: 3.7084871944261 on iter: 165700/250000
training loss: 2.326175564369 on iter: 165800/250000
training loss: 3.3668682677683 on iter: 165900/250000
training loss: 2.9829908073439 on iter: 166000/250000
training loss: 3.6697885373777 on iter: 166100/250000
training loss: 3.2274401419533 on iter: 166200/250000
training loss: 3.2893317918039 on iter: 166300/250000
training loss: 3.9494951471668 on iter: 166400/250000
training loss: 2.7244473452484 on iter: 166500/250000
training loss: 3.9406529573806 on iter: 166600/250000
training loss: 3.3278591584593 on iter: 166700/250000
training loss: 3.4810155079081 on iter: 166800/250000
training loss: 3.7670762219768 on iter: 166900/250000
training loss: 3.2529088007462 on iter: 167000/250000
training loss: 3.1554972788164 on iter: 167100/250000
training loss: 2.8530141493845 on iter: 167200/250000
training loss: 3.1980364841139 on iter: 167300/250000
training loss: 3.3957053744122 on iter: 167400/250000
training loss: 3.7906923267036 on iter: 167500/250000
training loss: 2.4758770581382 on iter: 167600/250000
training loss: 3.3095080966692 on iter: 167700/250000
training loss: 3.9847085145167 on iter: 167800/250000
training loss: 3.791669966005 on iter: 167900/250000
training loss: 3.631826249623 on iter: 168000/250000
training loss: 3.6477641487967 on iter: 168100/250000
training loss: 3.1967194430575 on iter: 168200/250000
training loss: 3.1208002331017 on iter: 168300/250000
training loss: 3.0556193963626 on iter: 168400/250000
training loss: 3.2603748227777 on iter: 168500/250000
training loss: 2.9447456448956 on iter: 168600/250000
training loss: 3.9116115261958 on iter: 168700/250000
training loss: 4.4160711240783 on iter: 168800/250000
training loss: 3.0820257902178 on iter: 168900/250000
training loss: 4.1800076906947 on iter: 169000/250000
training loss: 3.3766902196332 on iter: 169100/250000
training loss: 2.9622824085782 on iter: 169200/250000
training loss: 4.1416192118429 on iter: 169300/250000
training loss: 3.2855414204307 on iter: 169400/250000
training loss: 3.19794955935 on iter: 169500/250000
training loss: 3.0875521283135 on iter: 169600/250000
training loss: 3.5581603833111 on iter: 169700/250000
training loss: 3.356832237715 on iter: 169800/250000
training loss: 2.2607397837473 on iter: 169900/250000
training loss: 3.012138670622 on iter: 170000/250000
training loss: 3.0052869533487 on iter: 170100/250000
training loss: 2.9972818601628 on iter: 170200/250000
training loss: 3.30963587878 on iter: 170300/250000
training loss: 3.1256595452715 on iter: 170400/250000
training loss: 3.2675873882784 on iter: 170500/250000
training loss: 3.1640237275227 on iter: 170600/250000
training loss: 2.9378628659752 on iter: 170700/250000
training loss: 4.0803093690841 on iter: 170800/250000
training loss: 3.0350517954134 on iter: 170900/250000
training loss: 3.2087548741922 on iter: 171000/250000
training loss: 3.0197716760461 on iter: 171100/250000
training loss: 3.5763205492883 on iter: 171200/250000
training loss: 3.1744145119505 on iter: 171300/250000
training loss: 3.6512739709688 on iter: 171400/250000
training loss: 3.3487663372561 on iter: 171500/250000
training loss: 2.8485024193511 on iter: 171600/250000
training loss: 3.2916129883438 on iter: 171700/250000
training loss: 3.7544242177886 on iter: 171800/250000
training loss: 4.1423728694747 on iter: 171900/250000
training loss: 3.6597996174045 on iter: 172000/250000
training loss: 3.8330171511182 on iter: 172100/250000
training loss: 2.5409590120281 on iter: 172200/250000
training loss: 3.2394564642306 on iter: 172300/250000
training loss: 3.0073436112224 on iter: 172400/250000
training loss: 2.854917877588 on iter: 172500/250000
training loss: 2.9770712489505 on iter: 172600/250000
training loss: 3.7945737151333 on iter: 172700/250000
training loss: 3.4142991048277 on iter: 172800/250000
training loss: 3.6414206283548 on iter: 172900/250000
training loss: 2.7816283463621 on iter: 173000/250000
training loss: 3.2802634054239 on iter: 173100/250000
training loss: 2.6636813494736 on iter: 173200/250000
training loss: 2.9052731049064 on iter: 173300/250000
training loss: 2.6779268278943 on iter: 173400/250000
training loss: 2.7009283392038 on iter: 173500/250000
training loss: 3.7502681539577 on iter: 173600/250000
training loss: 3.6068506376066 on iter: 173700/250000
training loss: 3.0503216448276 on iter: 173800/250000
training loss: 3.5082201373714 on iter: 173900/250000
training loss: 2.7168241854334 on iter: 174000/250000
training loss: 3.9206609523658 on iter: 174100/250000
training loss: 3.2335726234015 on iter: 174200/250000
training loss: 3.4875214064293 on iter: 174300/250000
training loss: 3.7899906856703 on iter: 174400/250000
training loss: 3.4675463259705 on iter: 174500/250000
training loss: 3.2900486288846 on iter: 174600/250000
training loss: 3.7326905466079 on iter: 174700/250000
training loss: 4.0806913904512 on iter: 174800/250000
training loss: 2.8029150934777 on iter: 174900/250000
training loss: 4.1766157929898 on iter: 175000/250000
training loss: 3.5279769160077 on iter: 175100/250000
training loss: 2.8458204941031 on iter: 175200/250000
training loss: 4.657495226402 on iter: 175300/250000
training loss: 3.9607424387489 on iter: 175400/250000
training loss: 2.639569396188 on iter: 175500/250000
training loss: 2.725496540991 on iter: 175600/250000
training loss: 2.8874309130692 on iter: 175700/250000
training loss: 2.2406833067123 on iter: 175800/250000
training loss: 3.1331204676939 on iter: 175900/250000
training loss: 3.5064818352308 on iter: 176000/250000
training loss: 3.2323635663145 on iter: 176100/250000
training loss: 3.529775020824 on iter: 176200/250000
training loss: 3.1816688832826 on iter: 176300/250000
training loss: 4.4604013776001 on iter: 176400/250000
training loss: 3.4622607391277 on iter: 176500/250000
training loss: 2.6517923252925 on iter: 176600/250000
training loss: 2.9450801028843 on iter: 176700/250000
training loss: 3.3150310129003 on iter: 176800/250000
training loss: 2.8698416827085 on iter: 176900/250000
training loss: 2.8527610608405 on iter: 177000/250000
training loss: 2.9467016061158 on iter: 177100/250000
training loss: 3.9926597720799 on iter: 177200/250000
training loss: 3.5236885389842 on iter: 177300/250000
training loss: 2.8328734750151 on iter: 177400/250000
training loss: 2.8339074168963 on iter: 177500/250000
training loss: 2.7964308640468 on iter: 177600/250000
training loss: 3.0842754680359 on iter: 177700/250000
training loss: 3.0858119929374 on iter: 177800/250000
training loss: 3.2440029579086 on iter: 177900/250000
training loss: 3.3668127869295 on iter: 178000/250000
training loss: 4.5369000710414 on iter: 178100/250000
training loss: 3.2098186230172 on iter: 178200/250000
training loss: 3.0155959987727 on iter: 178300/250000
training loss: 3.9480312096388 on iter: 178400/250000
training loss: 2.6478225857652 on iter: 178500/250000
training loss: 2.6870524524874 on iter: 178600/250000
training loss: 3.2446098667354 on iter: 178700/250000
training loss: 3.0942768252838 on iter: 178800/250000
training loss: 3.5349708294526 on iter: 178900/250000
training loss: 4.1046860529614 on iter: 179000/250000
training loss: 3.1975469362817 on iter: 179100/250000
training loss: 3.9773017448813 on iter: 179200/250000
training loss: 2.8578554575507 on iter: 179300/250000
training loss: 3.0631132937809 on iter: 179400/250000
training loss: 4.4499042711499 on iter: 179500/250000
training loss: 2.7196180541186 on iter: 179600/250000
training loss: 4.1486635303348 on iter: 179700/250000
training loss: 3.4523280993576 on iter: 179800/250000
training loss: 3.8501329371178 on iter: 179900/250000
training loss: 3.6094786023012 on iter: 180000/250000
training loss: 2.8132121628816 on iter: 180100/250000
training loss: 3.3951695238425 on iter: 180200/250000
training loss: 3.9786175152053 on iter: 180300/250000
training loss: 3.2832153599932 on iter: 180400/250000
training loss: 2.7363903040423 on iter: 180500/250000
training loss: 2.9722434718491 on iter: 180600/250000
training loss: 3.97066869845 on iter: 180700/250000
training loss: 3.1680701677896 on iter: 180800/250000
training loss: 3.3530159216419 on iter: 180900/250000
training loss: 2.742875514431 on iter: 181000/250000
training loss: 3.6049398416167 on iter: 181100/250000
training loss: 2.4815385842645 on iter: 181200/250000
training loss: 4.1660899621742 on iter: 181300/250000
training loss: 3.3710964750502 on iter: 181400/250000
training loss: 3.1074729863665 on iter: 181500/250000
training loss: 3.5542525072855 on iter: 181600/250000
training loss: 2.5227938642628 on iter: 181700/250000
training loss: 3.3142566638187 on iter: 181800/250000
training loss: 3.7891994373824 on iter: 181900/250000
training loss: 3.3776476313392 on iter: 182000/250000
training loss: 3.2372757722908 on iter: 182100/250000
training loss: 3.4306129653224 on iter: 182200/250000
training loss: 3.0434690584287 on iter: 182300/250000
training loss: 3.4935588564698 on iter: 182400/250000
training loss: 3.325774801044 on iter: 182500/250000
training loss: 3.4627885077786 on iter: 182600/250000
training loss: 3.1993755427557 on iter: 182700/250000
training loss: 3.5917627106992 on iter: 182800/250000
training loss: 3.8020505152604 on iter: 182900/250000
training loss: 3.249858175914 on iter: 183000/250000
training loss: 3.9869915329735 on iter: 183100/250000
training loss: 3.3847562258955 on iter: 183200/250000
training loss: 3.5059250671899 on iter: 183300/250000
training loss: 3.5921702067591 on iter: 183400/250000
training loss: 3.831356668824 on iter: 183500/250000
training loss: 4.270268980386 on iter: 183600/250000
training loss: 3.5694432468422 on iter: 183700/250000
training loss: 3.535141970246 on iter: 183800/250000
training loss: 3.5894423783631 on iter: 183900/250000
training loss: 3.4285501089487 on iter: 184000/250000
training loss: 2.6027282928234 on iter: 184100/250000
training loss: 3.0035990358577 on iter: 184200/250000
training loss: 3.5970740144532 on iter: 184300/250000
training loss: 3.1070394937466 on iter: 184400/250000
training loss: 3.8862870442421 on iter: 184500/250000
training loss: 3.9598951981425 on iter: 184600/250000
training loss: 4.3554684450926 on iter: 184700/250000
training loss: 2.8432400574828 on iter: 184800/250000
training loss: 3.090110544674 on iter: 184900/250000
training loss: 2.1767477429804 on iter: 185000/250000
training loss: 3.2489897950407 on iter: 185100/250000
training loss: 4.0155852393084 on iter: 185200/250000
training loss: 3.7186724243371 on iter: 185300/250000
training loss: 3.2018411314786 on iter: 185400/250000
training loss: 3.7734615501648 on iter: 185500/250000
training loss: 3.5515765813853 on iter: 185600/250000
training loss: 2.8843863950744 on iter: 185700/250000
training loss: 4.4683158216292 on iter: 185800/250000
training loss: 3.5147084822821 on iter: 185900/250000
training loss: 3.2077882199152 on iter: 186000/250000
training loss: 3.1507280260406 on iter: 186100/250000
training loss: 3.2913279292982 on iter: 186200/250000
training loss: 2.774737798163 on iter: 186300/250000
training loss: 3.3798584811003 on iter: 186400/250000
training loss: 4.5493890190981 on iter: 186500/250000
training loss: 3.1446530607928 on iter: 186600/250000
training loss: 3.7042757095056 on iter: 186700/250000
training loss: 3.6091796806887 on iter: 186800/250000
training loss: 3.2117202664722 on iter: 186900/250000
training loss: 3.3629551182687 on iter: 187000/250000
training loss: 3.890511569745 on iter: 187100/250000
training loss: 3.5534612320308 on iter: 187200/250000
training loss: 3.7414096726285 on iter: 187300/250000
training loss: 3.1913204439832 on iter: 187400/250000
training loss: 3.577087688938 on iter: 187500/250000
training loss: 3.764693343594 on iter: 187600/250000
training loss: 3.027174171654 on iter: 187700/250000
training loss: 3.4209680641656 on iter: 187800/250000
training loss: 4.1318910750848 on iter: 187900/250000
training loss: 2.6760001996168 on iter: 188000/250000
training loss: 4.0416199530484 on iter: 188100/250000
training loss: 3.7187834112101 on iter: 188200/250000
training loss: 2.9420892931181 on iter: 188300/250000
training loss: 3.8980671724927 on iter: 188400/250000
training loss: 2.9705569007458 on iter: 188500/250000
training loss: 3.3465413557369 on iter: 188600/250000
training loss: 3.5059940748237 on iter: 188700/250000
training loss: 3.0960621544739 on iter: 188800/250000
training loss: 3.8515676299955 on iter: 188900/250000
training loss: 3.9243032422231 on iter: 189000/250000
training loss: 2.9268862489172 on iter: 189100/250000
training loss: 3.6832015292702 on iter: 189200/250000
training loss: 3.6100792085948 on iter: 189300/250000
training loss: 3.3750878825255 on iter: 189400/250000
training loss: 2.984078175535 on iter: 189500/250000
training loss: 2.5185274118705 on iter: 189600/250000
training loss: 3.2501511997224 on iter: 189700/250000
training loss: 3.1595284744119 on iter: 189800/250000
training loss: 3.3899814154784 on iter: 189900/250000
training loss: 3.2801970242773 on iter: 190000/250000
training loss: 3.2721200978757 on iter: 190100/250000
training loss: 3.634613987388 on iter: 190200/250000
training loss: 2.2128118609204 on iter: 190300/250000
training loss: 2.9541574453495 on iter: 190400/250000
training loss: 3.2195240000326 on iter: 190500/250000
training loss: 3.0495938270636 on iter: 190600/250000
training loss: 2.8472899624661 on iter: 190700/250000
training loss: 2.7708336129242 on iter: 190800/250000
training loss: 3.5072151456306 on iter: 190900/250000
training loss: 3.1158744870388 on iter: 191000/250000
training loss: 3.0393850600012 on iter: 191100/250000
training loss: 3.6690639231667 on iter: 191200/250000
training loss: 3.4168339047725 on iter: 191300/250000
training loss: 3.7733985597606 on iter: 191400/250000
training loss: 3.583808390611 on iter: 191500/250000
training loss: 3.2150354173204 on iter: 191600/250000
training loss: 3.7095507556186 on iter: 191700/250000
training loss: 3.7423333222687 on iter: 191800/250000
training loss: 4.3524495373718 on iter: 191900/250000
training loss: 2.9298455535417 on iter: 192000/250000
training loss: 2.6727009261834 on iter: 192100/250000
training loss: 3.4970012032011 on iter: 192200/250000
training loss: 3.0835584415709 on iter: 192300/250000
training loss: 3.8224519146004 on iter: 192400/250000
training loss: 3.5229836726403 on iter: 192500/250000
training loss: 3.6529231183439 on iter: 192600/250000
training loss: 2.0804514463896 on iter: 192700/250000
training loss: 3.2388531369815 on iter: 192800/250000
training loss: 3.2457003573215 on iter: 192900/250000
training loss: 3.6300271943397 on iter: 193000/250000
training loss: 2.8738532315635 on iter: 193100/250000
training loss: 3.6250352151961 on iter: 193200/250000
training loss: 2.5943285112184 on iter: 193300/250000
training loss: 3.1983324442881 on iter: 193400/250000
training loss: 3.4394147747909 on iter: 193500/250000
training loss: 3.8370366845108 on iter: 193600/250000
training loss: 3.3140125642031 on iter: 193700/250000
training loss: 3.5292269223728 on iter: 193800/250000
training loss: 2.8979827759652 on iter: 193900/250000
training loss: 4.3946051535823 on iter: 194000/250000
training loss: 3.1423815378344 on iter: 194100/250000
training loss: 2.9360367303536 on iter: 194200/250000
training loss: 3.6264965200168 on iter: 194300/250000
training loss: 2.6906737282023 on iter: 194400/250000
training loss: 3.0727200472449 on iter: 194500/250000
training loss: 5.0882565890174 on iter: 194600/250000
training loss: 3.0859958731539 on iter: 194700/250000
training loss: 2.8725108041677 on iter: 194800/250000
training loss: 3.568971643687 on iter: 194900/250000
training loss: 2.6593899357719 on iter: 195000/250000
training loss: 3.0492586741029 on iter: 195100/250000
training loss: 2.9650012282307 on iter: 195200/250000
training loss: 2.9788860462511 on iter: 195300/250000
training loss: 3.1361982522137 on iter: 195400/250000
training loss: 3.2364943347137 on iter: 195500/250000
training loss: 2.8235650220695 on iter: 195600/250000
training loss: 3.0616675571789 on iter: 195700/250000
training loss: 3.5136865286039 on iter: 195800/250000
training loss: 2.8860581063603 on iter: 195900/250000
training loss: 2.8440187504655 on iter: 196000/250000
training loss: 4.1033366118675 on iter: 196100/250000
training loss: 2.5589651902623 on iter: 196200/250000
training loss: 4.0509887807897 on iter: 196300/250000
training loss: 2.6688309068091 on iter: 196400/250000
training loss: 3.4156312374844 on iter: 196500/250000
training loss: 3.2277542245668 on iter: 196600/250000
training loss: 3.5142702775067 on iter: 196700/250000
training loss: 3.0632344132347 on iter: 196800/250000
training loss: 3.205910187204 on iter: 196900/250000
training loss: 3.9556479448592 on iter: 197000/250000
training loss: 2.7308215369008 on iter: 197100/250000
training loss: 4.2521735586197 on iter: 197200/250000
training loss: 2.146871686188 on iter: 197300/250000
training loss: 3.0431597164748 on iter: 197400/250000
training loss: 2.5083544666906 on iter: 197500/250000
training loss: 3.0375575812679 on iter: 197600/250000
training loss: 4.1960659814092 on iter: 197700/250000
training loss: 3.3974248656745 on iter: 197800/250000
training loss: 2.6915639229319 on iter: 197900/250000
training loss: 3.7355879648571 on iter: 198000/250000
training loss: 2.6893548496009 on iter: 198100/250000
training loss: 3.7951106546295 on iter: 198200/250000
training loss: 3.7886224547856 on iter: 198300/250000
training loss: 2.5845880819178 on iter: 198400/250000
training loss: 3.6002165937323 on iter: 198500/250000
training loss: 4.0323907510152 on iter: 198600/250000
training loss: 2.7429553439837 on iter: 198700/250000
training loss: 3.0240744950675 on iter: 198800/250000
training loss: 3.0878260071197 on iter: 198900/250000
training loss: 3.4168752605147 on iter: 199000/250000
training loss: 3.6896652133397 on iter: 199100/250000
training loss: 3.7967471595426 on iter: 199200/250000
training loss: 2.7729516573546 on iter: 199300/250000
training loss: 4.2997581535026 on iter: 199400/250000
training loss: 4.6309462571105 on iter: 199500/250000
training loss: 3.5261020331198 on iter: 199600/250000
training loss: 4.0680027052183 on iter: 199700/250000
training loss: 3.0566835260618 on iter: 199800/250000
training loss: 4.4284821975396 on iter: 199900/250000
training loss: 2.9126309686831 on iter: 200000/250000
training loss: 3.351641394324 on iter: 200100/250000
training loss: 3.3054212086528 on iter: 200200/250000
training loss: 2.3929050444014 on iter: 200300/250000
training loss: 3.4648891579699 on iter: 200400/250000
training loss: 3.3569449122955 on iter: 200500/250000
training loss: 4.1744726714015 on iter: 200600/250000
training loss: 3.0970493306862 on iter: 200700/250000
training loss: 3.000000337409 on iter: 200800/250000
training loss: 2.6247448805613 on iter: 200900/250000
training loss: 2.8764553193177 on iter: 201000/250000
training loss: 3.0649521065505 on iter: 201100/250000
training loss: 4.0312409536202 on iter: 201200/250000
training loss: 2.9623110423785 on iter: 201300/250000
training loss: 3.4866195226666 on iter: 201400/250000
training loss: 3.7082204493991 on iter: 201500/250000
training loss: 3.0733613578741 on iter: 201600/250000
training loss: 3.8991338211026 on iter: 201700/250000
training loss: 4.2002073066306 on iter: 201800/250000
training loss: 3.2801772392552 on iter: 201900/250000
training loss: 3.9594825350505 on iter: 202000/250000
training loss: 3.2485890381056 on iter: 202100/250000
training loss: 4.2556018498545 on iter: 202200/250000
training loss: 3.5243606675121 on iter: 202300/250000
training loss: 3.2222084632249 on iter: 202400/250000
training loss: 2.9771933139995 on iter: 202500/250000
training loss: 3.5370324173047 on iter: 202600/250000
training loss: 2.75729111514 on iter: 202700/250000
training loss: 2.5616705042401 on iter: 202800/250000
training loss: 2.7533123180122 on iter: 202900/250000
training loss: 2.9146491275574 on iter: 203000/250000
training loss: 3.5738552339389 on iter: 203100/250000
training loss: 3.0633965005068 on iter: 203200/250000
training loss: 3.8956071874906 on iter: 203300/250000
training loss: 3.0987018115638 on iter: 203400/250000
training loss: 2.879664350378 on iter: 203500/250000
training loss: 2.516196224322 on iter: 203600/250000
training loss: 3.1965457333379 on iter: 203700/250000
training loss: 4.2719721252 on iter: 203800/250000
training loss: 3.092979355228 on iter: 203900/250000
training loss: 3.3474333530443 on iter: 204000/250000
training loss: 2.8867592294165 on iter: 204100/250000
training loss: 2.7508175702944 on iter: 204200/250000
training loss: 2.6684566612715 on iter: 204300/250000
training loss: 3.5202730445125 on iter: 204400/250000
training loss: 3.6204192770763 on iter: 204500/250000
training loss: 3.256013060085 on iter: 204600/250000
training loss: 3.120967985452 on iter: 204700/250000
training loss: 3.1899345853165 on iter: 204800/250000
training loss: 2.697922520606 on iter: 204900/250000
training loss: 3.3943842113098 on iter: 205000/250000
training loss: 3.1227835583482 on iter: 205100/250000
training loss: 3.156265533015 on iter: 205200/250000
training loss: 3.6096825861942 on iter: 205300/250000
training loss: 3.708707083639 on iter: 205400/250000
training loss: 3.0356206271694 on iter: 205500/250000
training loss: 3.4962143783591 on iter: 205600/250000
training loss: 2.7437945225649 on iter: 205700/250000
training loss: 3.2797737710193 on iter: 205800/250000
training loss: 3.3470197263849 on iter: 205900/250000
training loss: 3.3013066397889 on iter: 206000/250000
training loss: 2.6434702474693 on iter: 206100/250000
training loss: 3.2301745125776 on iter: 206200/250000
training loss: 3.1322480510125 on iter: 206300/250000
training loss: 3.1441778707504 on iter: 206400/250000
training loss: 3.1030859863157 on iter: 206500/250000
training loss: 3.3041997565538 on iter: 206600/250000
training loss: 3.8846121672078 on iter: 206700/250000
training loss: 2.6529793066419 on iter: 206800/250000
training loss: 2.7940156554096 on iter: 206900/250000
training loss: 3.883012762981 on iter: 207000/250000
training loss: 3.8133797191914 on iter: 207100/250000
training loss: 3.2134104536808 on iter: 207200/250000
training loss: 2.6035322454719 on iter: 207300/250000
training loss: 4.1970773150279 on iter: 207400/250000
training loss: 3.5925860597322 on iter: 207500/250000
training loss: 2.8784071293897 on iter: 207600/250000
training loss: 2.9748073342708 on iter: 207700/250000
training loss: 3.0926378106156 on iter: 207800/250000
training loss: 3.2794433888497 on iter: 207900/250000
training loss: 3.796577161224 on iter: 208000/250000
training loss: 3.7102427563432 on iter: 208100/250000
training loss: 3.4687503541161 on iter: 208200/250000
training loss: 2.8137530450802 on iter: 208300/250000
training loss: 5.0068906004882 on iter: 208400/250000
training loss: 4.3736321585625 on iter: 208500/250000
training loss: 2.7158244032623 on iter: 208600/250000
training loss: 2.814417637705 on iter: 208700/250000
training loss: 2.779987679188 on iter: 208800/250000
training loss: 3.4255362236851 on iter: 208900/250000
training loss: 3.5369139974966 on iter: 209000/250000
training loss: 3.4414253994293 on iter: 209100/250000
training loss: 2.641281492927 on iter: 209200/250000
training loss: 3.3758986444434 on iter: 209300/250000
training loss: 3.0226607286229 on iter: 209400/250000
training loss: 4.593721587884 on iter: 209500/250000
training loss: 3.0680501137223 on iter: 209600/250000
training loss: 2.9885168952455 on iter: 209700/250000
training loss: 4.2841778529475 on iter: 209800/250000
training loss: 4.3540262040401 on iter: 209900/250000
training loss: 3.6254452785746 on iter: 210000/250000
training loss: 2.6809128896008 on iter: 210100/250000
training loss: 3.6041844968411 on iter: 210200/250000
training loss: 2.8189605841977 on iter: 210300/250000
training loss: 2.948852268713 on iter: 210400/250000
training loss: 4.4853512016123 on iter: 210500/250000
training loss: 3.2458737635985 on iter: 210600/250000
training loss: 2.7435595322724 on iter: 210700/250000
training loss: 4.6657076798456 on iter: 210800/250000
training loss: 3.3064939360191 on iter: 210900/250000
training loss: 3.1474221241033 on iter: 211000/250000
training loss: 3.7298125449449 on iter: 211100/250000
training loss: 3.5112845629393 on iter: 211200/250000
training loss: 3.4565726379227 on iter: 211300/250000
training loss: 4.3954558092315 on iter: 211400/250000
training loss: 3.0779085979703 on iter: 211500/250000
training loss: 3.0605861504257 on iter: 211600/250000
training loss: 2.5332512256714 on iter: 211700/250000
training loss: 3.52518711163 on iter: 211800/250000
training loss: 3.635743266739 on iter: 211900/250000
training loss: 3.1572965575785 on iter: 212000/250000
training loss: 3.2037947877165 on iter: 212100/250000
training loss: 3.3827679976458 on iter: 212200/250000
training loss: 3.6383198953587 on iter: 212300/250000
training loss: 3.0592376927578 on iter: 212400/250000
training loss: 3.4922691442832 on iter: 212500/250000
training loss: 2.5299495525318 on iter: 212600/250000
training loss: 2.7872933544963 on iter: 212700/250000
training loss: 3.1332149311342 on iter: 212800/250000
training loss: 2.4013076351832 on iter: 212900/250000
training loss: 3.6926803023591 on iter: 213000/250000
training loss: 3.241899661021 on iter: 213100/250000
training loss: 2.9590252015405 on iter: 213200/250000
training loss: 3.7422842685932 on iter: 213300/250000
training loss: 3.6614987553137 on iter: 213400/250000
training loss: 3.1842950733473 on iter: 213500/250000
training loss: 3.9943871748408 on iter: 213600/250000
training loss: 2.8259809487528 on iter: 213700/250000
training loss: 3.5620814390849 on iter: 213800/250000
training loss: 3.074873192661 on iter: 213900/250000
training loss: 3.7754133851657 on iter: 214000/250000
training loss: 3.5226575734516 on iter: 214100/250000
training loss: 3.512267209835 on iter: 214200/250000
training loss: 2.9905809555751 on iter: 214300/250000
training loss: 2.7491155172616 on iter: 214400/250000
training loss: 3.1796475847584 on iter: 214500/250000
training loss: 3.6126656716861 on iter: 214600/250000
training loss: 3.2373759507066 on iter: 214700/250000
training loss: 2.814663950393 on iter: 214800/250000
training loss: 2.5253820276964 on iter: 214900/250000
training loss: 3.9956130076467 on iter: 215000/250000
training loss: 3.3744751360677 on iter: 215100/250000
training loss: 3.3037867589759 on iter: 215200/250000
training loss: 3.4443564046137 on iter: 215300/250000
training loss: 3.2638110997102 on iter: 215400/250000
training loss: 2.807528032694 on iter: 215500/250000
training loss: 3.0949010124679 on iter: 215600/250000
training loss: 3.842152613962 on iter: 215700/250000
training loss: 3.0678621132929 on iter: 215800/250000
training loss: 4.1266916872136 on iter: 215900/250000
training loss: 2.879065661241 on iter: 216000/250000
training loss: 3.5817074173526 on iter: 216100/250000
training loss: 3.5515093432905 on iter: 216200/250000
training loss: 4.8637714155316 on iter: 216300/250000
training loss: 2.8314491077641 on iter: 216400/250000
training loss: 2.9582269859409 on iter: 216500/250000
training loss: 2.9111766295417 on iter: 216600/250000
training loss: 2.9880033001092 on iter: 216700/250000
training loss: 2.597067957825 on iter: 216800/250000
training loss: 3.4660566767691 on iter: 216900/250000
training loss: 3.2157060090685 on iter: 217000/250000
training loss: 3.5371842208427 on iter: 217100/250000
training loss: 3.3636287382076 on iter: 217200/250000
training loss: 3.1207191423389 on iter: 217300/250000
training loss: 3.294382756039 on iter: 217400/250000
training loss: 3.0790578834551 on iter: 217500/250000
training loss: 3.0425369379921 on iter: 217600/250000
training loss: 3.5591209035656 on iter: 217700/250000
training loss: 2.9041111553089 on iter: 217800/250000
training loss: 2.7738214426491 on iter: 217900/250000
training loss: 3.3482408962377 on iter: 218000/250000
training loss: 2.9569447984296 on iter: 218100/250000
training loss: 3.8216991850676 on iter: 218200/250000
training loss: 2.6235074558824 on iter: 218300/250000
training loss: 3.57485228128 on iter: 218400/250000
training loss: 3.6831684965608 on iter: 218500/250000
training loss: 4.1405506017543 on iter: 218600/250000
training loss: 2.6017984112244 on iter: 218700/250000
training loss: 3.5454014143638 on iter: 218800/250000
training loss: 2.931231927469 on iter: 218900/250000
training loss: 2.6875929379737 on iter: 219000/250000
training loss: 2.5844932372896 on iter: 219100/250000
training loss: 2.955589432662 on iter: 219200/250000
training loss: 4.3630518483686 on iter: 219300/250000
training loss: 4.289869096856 on iter: 219400/250000
training loss: 3.7261543459759 on iter: 219500/250000
training loss: 2.5109145161321 on iter: 219600/250000
training loss: 3.4663756662169 on iter: 219700/250000
training loss: 4.0655941473606 on iter: 219800/250000
training loss: 4.2542882980936 on iter: 219900/250000
training loss: 3.0070757935457 on iter: 220000/250000
training loss: 3.1886255738637 on iter: 220100/250000
training loss: 3.1511766644659 on iter: 220200/250000
training loss: 3.0523966257189 on iter: 220300/250000
training loss: 2.8613804084147 on iter: 220400/250000
training loss: 3.216568740588 on iter: 220500/250000
training loss: 3.9544976088451 on iter: 220600/250000
training loss: 3.0742079724179 on iter: 220700/250000
training loss: 2.8354512968181 on iter: 220800/250000
training loss: 3.9907652554911 on iter: 220900/250000
training loss: 3.8050289807468 on iter: 221000/250000
training loss: 3.6777420162529 on iter: 221100/250000
training loss: 2.8283736719971 on iter: 221200/250000
training loss: 3.4954236145294 on iter: 221300/250000
training loss: 3.581932319003 on iter: 221400/250000
training loss: 3.0489711462645 on iter: 221500/250000
training loss: 2.9265023072035 on iter: 221600/250000
training loss: 3.3436137484663 on iter: 221700/250000
training loss: 3.5895875830308 on iter: 221800/250000
training loss: 3.3839665123859 on iter: 221900/250000
training loss: 3.4691574605435 on iter: 222000/250000
training loss: 3.3174586117528 on iter: 222100/250000
training loss: 2.8246656383313 on iter: 222200/250000
training loss: 3.1925586941112 on iter: 222300/250000
training loss: 3.6582393763772 on iter: 222400/250000
training loss: 3.7972615623702 on iter: 222500/250000
training loss: 3.2381564975489 on iter: 222600/250000
training loss: 3.8989031738233 on iter: 222700/250000
training loss: 3.5191563989311 on iter: 222800/250000
training loss: 2.5815223399919 on iter: 222900/250000
training loss: 3.7172135003291 on iter: 223000/250000
training loss: 2.7821113596417 on iter: 223100/250000
training loss: 2.9979454704795 on iter: 223200/250000
training loss: 3.765663913583 on iter: 223300/250000
training loss: 2.5135384219555 on iter: 223400/250000
training loss: 2.8082504966493 on iter: 223500/250000
training loss: 3.4098591001589 on iter: 223600/250000
training loss: 3.2159528340958 on iter: 223700/250000
training loss: 3.4136044515675 on iter: 223800/250000
training loss: 2.9928283587134 on iter: 223900/250000
training loss: 3.6579947074854 on iter: 224000/250000
training loss: 2.6873840253849 on iter: 224100/250000
training loss: 2.880751792511 on iter: 224200/250000
training loss: 3.1706758928553 on iter: 224300/250000
training loss: 3.0682432823849 on iter: 224400/250000
training loss: 2.4947320641459 on iter: 224500/250000
training loss: 3.4208626747147 on iter: 224600/250000
training loss: 3.1247349091988 on iter: 224700/250000
training loss: 3.6815280757223 on iter: 224800/250000
training loss: 3.3348127158578 on iter: 224900/250000
training loss: 3.1505004567567 on iter: 225000/250000
training loss: 2.9225184516139 on iter: 225100/250000
training loss: 3.2068526098038 on iter: 225200/250000
training loss: 2.9418847600635 on iter: 225300/250000
training loss: 3.3607732053323 on iter: 225400/250000
training loss: 3.372569974706 on iter: 225500/250000
training loss: 3.6629385493336 on iter: 225600/250000
training loss: 3.3420996130371 on iter: 225700/250000
training loss: 4.2094248151562 on iter: 225800/250000
training loss: 3.2119100965526 on iter: 225900/250000
training loss: 4.1322824917841 on iter: 226000/250000
training loss: 3.5741321489846 on iter: 226100/250000
training loss: 3.2189066597623 on iter: 226200/250000
training loss: 3.4872401137293 on iter: 226300/250000
training loss: 4.012950868165 on iter: 226400/250000
training loss: 3.5865417636435 on iter: 226500/250000
training loss: 3.8532376557863 on iter: 226600/250000
training loss: 3.4314967772185 on iter: 226700/250000
training loss: 3.5951985000759 on iter: 226800/250000
training loss: 3.2831555659313 on iter: 226900/250000
training loss: 3.6745685092604 on iter: 227000/250000
training loss: 2.7358940222734 on iter: 227100/250000
training loss: 2.4664015872146 on iter: 227200/250000
training loss: 3.8491360099566 on iter: 227300/250000
training loss: 3.2391225559859 on iter: 227400/250000
training loss: 3.3945432521114 on iter: 227500/250000
training loss: 3.7368247678035 on iter: 227600/250000
training loss: 3.3697466497696 on iter: 227700/250000
training loss: 3.0387944546217 on iter: 227800/250000
training loss: 3.3810351326111 on iter: 227900/250000
training loss: 2.8376828012934 on iter: 228000/250000
training loss: 3.6390891052394 on iter: 228100/250000
training loss: 2.9516130710053 on iter: 228200/250000
training loss: 3.6456545871958 on iter: 228300/250000
training loss: 3.9499589654162 on iter: 228400/250000
training loss: 3.5189815808959 on iter: 228500/250000
training loss: 4.4339432034949 on iter: 228600/250000
training loss: 2.297319741252 on iter: 228700/250000
training loss: 2.7826563128914 on iter: 228800/250000
training loss: 3.4437348084735 on iter: 228900/250000
training loss: 3.6128828556407 on iter: 229000/250000
training loss: 2.9108622581562 on iter: 229100/250000
training loss: 2.8785969192383 on iter: 229200/250000
training loss: 3.7133206820574 on iter: 229300/250000
training loss: 3.4580622546339 on iter: 229400/250000
training loss: 3.9072923281895 on iter: 229500/250000
training loss: 3.5659227195912 on iter: 229600/250000
training loss: 4.0292480142218 on iter: 229700/250000
training loss: 4.1581309199301 on iter: 229800/250000
training loss: 3.3826501675195 on iter: 229900/250000
training loss: 3.0806833538143 on iter: 230000/250000
training loss: 2.8901011168001 on iter: 230100/250000
training loss: 3.3667216195878 on iter: 230200/250000
training loss: 3.1666434136145 on iter: 230300/250000
training loss: 3.978565824571 on iter: 230400/250000
training loss: 3.1341873542199 on iter: 230500/250000
training loss: 3.176825291996 on iter: 230600/250000
training loss: 2.9680040427514 on iter: 230700/250000
training loss: 2.6503080650749 on iter: 230800/250000
training loss: 4.3726788810639 on iter: 230900/250000
training loss: 3.4915072827225 on iter: 231000/250000
training loss: 3.2226580752709 on iter: 231100/250000
training loss: 2.6147277752367 on iter: 231200/250000
training loss: 2.9207842726833 on iter: 231300/250000
training loss: 3.2695706764066 on iter: 231400/250000
training loss: 3.3827275637753 on iter: 231500/250000
training loss: 3.5233924240502 on iter: 231600/250000
training loss: 2.8501660024453 on iter: 231700/250000
training loss: 3.4673975070218 on iter: 231800/250000
training loss: 3.3414797968176 on iter: 231900/250000
training loss: 2.8149688008611 on iter: 232000/250000
training loss: 3.002707112906 on iter: 232100/250000
training loss: 2.5289504120017 on iter: 232200/250000
training loss: 3.619335642313 on iter: 232300/250000
training loss: 2.8671842770037 on iter: 232400/250000
training loss: 3.4106930815215 on iter: 232500/250000
training loss: 2.7828853545754 on iter: 232600/250000
training loss: 3.3791878899697 on iter: 232700/250000
training loss: 3.4135990554901 on iter: 232800/250000
training loss: 3.7615660899152 on iter: 232900/250000
training loss: 3.5780240326261 on iter: 233000/250000
training loss: 3.4550931951008 on iter: 233100/250000
training loss: 2.9831703189695 on iter: 233200/250000
training loss: 2.79262090438 on iter: 233300/250000
training loss: 3.1644804539251 on iter: 233400/250000
training loss: 2.9000526017791 on iter: 233500/250000
training loss: 4.7649635484205 on iter: 233600/250000
training loss: 2.8428451949131 on iter: 233700/250000
training loss: 3.0148724566879 on iter: 233800/250000
training loss: 2.8972015395283 on iter: 233900/250000
training loss: 2.928774301016 on iter: 234000/250000
training loss: 3.0296405682198 on iter: 234100/250000
training loss: 3.6426623827765 on iter: 234200/250000
training loss: 2.8711173515687 on iter: 234300/250000
training loss: 2.6064220378639 on iter: 234400/250000
training loss: 4.7338543316311 on iter: 234500/250000
training loss: 3.1066039066028 on iter: 234600/250000
training loss: 3.92324208577 on iter: 234700/250000
training loss: 2.6346543970682 on iter: 234800/250000
training loss: 2.7444490404421 on iter: 234900/250000
training loss: 3.3968233480404 on iter: 235000/250000
training loss: 3.8141486141069 on iter: 235100/250000
training loss: 3.2119260878367 on iter: 235200/250000
training loss: 2.9417580083167 on iter: 235300/250000
training loss: 3.0133136230627 on iter: 235400/250000
training loss: 2.8252717735656 on iter: 235500/250000
training loss: 3.5679855400966 on iter: 235600/250000
training loss: 3.3930188022491 on iter: 235700/250000
training loss: 3.4367335640756 on iter: 235800/250000
training loss: 2.6398262744272 on iter: 235900/250000
training loss: 4.0151133505136 on iter: 236000/250000
training loss: 3.7133061356954 on iter: 236100/250000
training loss: 3.005443831697 on iter: 236200/250000
training loss: 4.0458972046587 on iter: 236300/250000
training loss: 3.4784401605404 on iter: 236400/250000
training loss: 3.1938674183633 on iter: 236500/250000
training loss: 3.5501222511153 on iter: 236600/250000
training loss: 3.2035639186824 on iter: 236700/250000
training loss: 3.3158741225031 on iter: 236800/250000
training loss: 3.3470069989547 on iter: 236900/250000
training loss: 3.4281109723304 on iter: 237000/250000
training loss: 3.0539004436204 on iter: 237100/250000
training loss: 3.3954331016312 on iter: 237200/250000
training loss: 2.5774881487465 on iter: 237300/250000
training loss: 4.2606322342452 on iter: 237400/250000
training loss: 2.5696639974217 on iter: 237500/250000
training loss: 3.3480157371124 on iter: 237600/250000
training loss: 3.1549735440941 on iter: 237700/250000
training loss: 2.847922325064 on iter: 237800/250000
training loss: 3.9185762101662 on iter: 237900/250000
training loss: 3.6677903842627 on iter: 238000/250000
training loss: 3.3569961030461 on iter: 238100/250000
training loss: 3.504187315386 on iter: 238200/250000
training loss: 4.1127445799724 on iter: 238300/250000
training loss: 4.1683341346539 on iter: 238400/250000
training loss: 3.4977589346153 on iter: 238500/250000
training loss: 2.066816525935 on iter: 238600/250000
training loss: 3.0200398406999 on iter: 238700/250000
training loss: 3.31946004064 on iter: 238800/250000
training loss: 2.7271496894511 on iter: 238900/250000
training loss: 3.644592804465 on iter: 239000/250000
training loss: 3.7864456073965 on iter: 239100/250000
training loss: 3.2694869096411 on iter: 239200/250000
training loss: 3.1547023358138 on iter: 239300/250000
training loss: 3.4092435239767 on iter: 239400/250000
training loss: 3.5643791225801 on iter: 239500/250000
training loss: 3.2816797286429 on iter: 239600/250000
training loss: 3.6524804594816 on iter: 239700/250000
training loss: 3.6359692395707 on iter: 239800/250000
training loss: 4.3029315089855 on iter: 239900/250000
training loss: 3.8505905040258 on iter: 240000/250000
training loss: 3.913101922253 on iter: 240100/250000
training loss: 3.4441911540155 on iter: 240200/250000
training loss: 3.9639558071553 on iter: 240300/250000
training loss: 3.5427701351568 on iter: 240400/250000
training loss: 3.2397479088726 on iter: 240500/250000
training loss: 3.0127816062856 on iter: 240600/250000
training loss: 2.5289957929476 on iter: 240700/250000
training loss: 3.4292539206763 on iter: 240800/250000
training loss: 4.4374044042335 on iter: 240900/250000
training loss: 2.7807989508533 on iter: 241000/250000
training loss: 3.4889871151552 on iter: 241100/250000
training loss: 3.2264789508683 on iter: 241200/250000
training loss: 3.0630989170483 on iter: 241300/250000
training loss: 3.6901301482085 on iter: 241400/250000
training loss: 3.0741457568179 on iter: 241500/250000
training loss: 3.0200766861603 on iter: 241600/250000
training loss: 3.8183715785058 on iter: 241700/250000
training loss: 3.3579664285078 on iter: 241800/250000
training loss: 3.75875088453 on iter: 241900/250000
training loss: 4.2320832592263 on iter: 242000/250000
training loss: 3.5946558698624 on iter: 242100/250000
training loss: 2.1520302476826 on iter: 242200/250000
training loss: 3.2340493319224 on iter: 242300/250000
training loss: 3.2682406584414 on iter: 242400/250000
training loss: 2.9202101669918 on iter: 242500/250000
training loss: 2.9014664838839 on iter: 242600/250000
training loss: 2.7860538567122 on iter: 242700/250000
training loss: 2.9239221764488 on iter: 242800/250000
training loss: 2.9841851012133 on iter: 242900/250000
training loss: 3.1794594715864 on iter: 243000/250000
training loss: 3.3090687841943 on iter: 243100/250000
training loss: 3.4707521715026 on iter: 243200/250000
training loss: 3.4965538866112 on iter: 243300/250000
training loss: 3.4880536256653 on iter: 243400/250000
training loss: 2.7626875595416 on iter: 243500/250000
training loss: 3.2016050781809 on iter: 243600/250000
training loss: 3.847247433851 on iter: 243700/250000
training loss: 4.8453397560799 on iter: 243800/250000
training loss: 2.9864613019735 on iter: 243900/250000
training loss: 3.4826318064936 on iter: 244000/250000
training loss: 2.8077353152455 on iter: 244100/250000
training loss: 2.8236055844252 on iter: 244200/250000
training loss: 3.410147984499 on iter: 244300/250000
training loss: 2.3915119336661 on iter: 244400/250000
training loss: 3.4814422508165 on iter: 244500/250000
training loss: 3.0197128440989 on iter: 244600/250000
training loss: 3.5821954886704 on iter: 244700/250000
training loss: 3.1203239137506 on iter: 244800/250000
training loss: 3.3705882787881 on iter: 244900/250000
training loss: 3.0012074157539 on iter: 245000/250000
training loss: 3.1514461517094 on iter: 245100/250000
training loss: 3.3106291655752 on iter: 245200/250000
training loss: 2.6659367467982 on iter: 245300/250000
training loss: 3.1771432166825 on iter: 245400/250000
training loss: 2.6471931753356 on iter: 245500/250000
training loss: 3.6896210392306 on iter: 245600/250000
training loss: 3.5159365275502 on iter: 245700/250000
training loss: 3.3200454935297 on iter: 245800/250000
training loss: 3.3206008027831 on iter: 245900/250000
training loss: 2.9130790677 on iter: 246000/250000
training loss: 4.5603665940302 on iter: 246100/250000
training loss: 3.4700720774599 on iter: 246200/250000
training loss: 4.8854830732629 on iter: 246300/250000
training loss: 3.2980180678669 on iter: 246400/250000
training loss: 3.5913865760301 on iter: 246500/250000
training loss: 3.0790624585774 on iter: 246600/250000
training loss: 3.476289795025 on iter: 246700/250000
training loss: 2.7150108514144 on iter: 246800/250000
training loss: 3.9087440249076 on iter: 246900/250000
training loss: 3.1690978684945 on iter: 247000/250000
training loss: 2.8988086403879 on iter: 247100/250000
training loss: 3.2467076672374 on iter: 247200/250000
training loss: 4.2971069881409 on iter: 247300/250000
training loss: 3.4615159558087 on iter: 247400/250000
training loss: 3.4777209936989 on iter: 247500/250000
training loss: 2.9076608074442 on iter: 247600/250000
training loss: 3.0046127153319 on iter: 247700/250000
training loss: 3.6478911857521 on iter: 247800/250000
training loss: 3.7600461822598 on iter: 247900/250000
training loss: 3.8081851454663 on iter: 248000/250000
training loss: 3.0288262209269 on iter: 248100/250000
training loss: 3.8045254712336 on iter: 248200/250000
training loss: 3.0014096101222 on iter: 248300/250000
training loss: 2.6648702391464 on iter: 248400/250000
training loss: 3.0973710238868 on iter: 248500/250000
training loss: 3.21178833147 on iter: 248600/250000
training loss: 3.3652397308744 on iter: 248700/250000
training loss: 3.6136089057546 on iter: 248800/250000
training loss: 3.0909798858796 on iter: 248900/250000
training loss: 2.5578616976569 on iter: 249000/250000
training loss: 3.5662768480493 on iter: 249100/250000
training loss: 3.3371438164301 on iter: 249200/250000
training loss: 3.0698375165063 on iter: 249300/250000
training loss: 3.2172591927209 on iter: 249400/250000
training loss: 3.5046515325146 on iter: 249500/250000
training loss: 3.8954496279085 on iter: 249600/250000
training loss: 3.1951476890746 on iter: 249700/250000
training loss: 3.1147774654283 on iter: 249800/250000
training loss: 3.7507720261221 on iter: 249900/250000
training loss: 3.1113258678018 on iter: 250000/250000

a7b23 commented 6 years ago

I believe I was not using the rnn package mentioned by you, because luarocks install rnn, probably installs another package.

jnhwkim commented 6 years ago

training loss should be below 1.0, around 0.8, at the end of training. If you have a trouble with persistence, please reopen this issue.