Closed a7b23 closed 6 years ago
Can you share your training log? And, please check that you're using this rnn package, https://github.com/Element-Research/rnn.
{
batch_size : 2
common_embedding_size : 1200
input_encoding_size : 620
learning_rate_decay_start : 0
input_seconds : "data_train-val_test-dev_2k/seconds.json"
load_checkpoint_path : ""
rnn_model : "GRU"
vg_img_h5 : ""
dropout : 0.5
previous_iters : 0
input_skip : "skipthoughts_model"
label : ""
gpuid : 0
optimizer : "rmsprop"
input_img_h5 : "data_train-val_test-dev_2k/data_res.h5"
rnn_size : 2400
vg : false
model_name : "MLB"
input_ques_h5 : "data_train-val_test-dev_2k/data_prepro.h5"
kick_interval : 50000
seconds : true
glimpse : 2
vg_ques_h5 : ""
input_json : "data_train-val_test-dev_2k/data_prepro.json"
num_layers : 1
num_output : 2000
iterPerEpoch : 120000
mhdf5_size : 10000
max_iters : 250000
checkpoint_path : "model/"
save_checkpoint_every : 25000
learning_rate : 0.0003
clipping : 10
backend : "cudnn"
seed : 1231
learning_rate_decay_every : 100
}
DataLoader loading h5 file: data_train-val_test-dev_2k/data_prepro.h5
DataLoader loading h5 file: data_train-val_test-dev_2k/data_res.h5
Building the model...
MLB: No Shortcut
nn.Sequential {
[input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> output]
(1): nn.ParallelTable {
input
|-> (1): nn.Identity
-> (2): nn.Sequential {
[input -> (1) -> (2) -> output]
(1): nn.Transpose
(2): nn.Reshape(196x2048)
}
... -> output
}
(2): nn.ConcatTable {
input
|-> (1): nn.SelectTable(1) |
-> (2): nn.SelectTable(2)
-> (3): nn.Sequential { [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> output] (1): nn.ParallelTable { input |
-> (1): nn.Sequential {
| [input -> (1) -> (2) -> (3) -> (4) -> output]
| (1): nn.Dropout(0.5, busy)
| (2): nn.Linear(2400 -> 1200)
| (3): nn.Tanh
| (4): nn.Replicate
| }
-> (2): nn.Sequential { [input -> (1) -> (2) -> (3) -> (4) -> (5) -> output] (1): nn.Reshape(392x2048) (2): nn.Dropout(0.5, busy) (3): nn.Linear(2048 -> 1200) (4): nn.Tanh (5): nn.Reshape(2x196x1200) } ... -> output } (2): nn.CMulTable (3): nn.Reshape(2x14x14x1200) (4): nn.Transpose (5): nn.SpatialConvolution(1200 -> 2, 1x1) (6): nn.Reshape(2x2x196) (7): nn.SplitTable (8): nn.ParallelTable { input |
-> (1): nn.SoftMax
-> (2): nn.SoftMax ... -> output } } ... -> output } (3): nn.FlattenTable (4): nn.ConcatTable { input |
-> (1): nn.Sequential {
| [input -> (1) -> (2) -> (3) -> (4) -> output]
| (1): nn.SelectTable(1)
| (2): nn.Dropout(0.5, busy)
| (3): nn.Linear(2400 -> 2400)
| (4): nn.Tanh
| }
|-> (2): nn.Sequential { | [input -> (1) -> (2) -> output] | (1): nn.ConcatTable { | input | |
-> (1): nn.Sequential {
| | [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> output]
| | (1): nn.ConcatTable {
| | input
| | |-> (1): nn.SelectTable(3) | |
-> (2): nn.SelectTable(2)
| | ... -> output
| | }
| | (2): nn.ParallelTable {
| | input
| | |-> (1): nn.Identity | |
-> (2): nn.SplitTable
| | ... -> output
| | }
| | (3): nn.MixtureTable
| | (4): nn.Dropout(0.5, busy)
| | (5): nn.Linear(2048 -> 1200)
| | (6): nn.Tanh
| | }
| -> (2): nn.Sequential { | [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> output] | (1): nn.ConcatTable { | input | |
-> (1): nn.SelectTable(4)
| -> (2): nn.SelectTable(2) | ... -> output | } | (2): nn.ParallelTable { | input | |
-> (1): nn.Identity
| -> (2): nn.SplitTable | ... -> output | } | (3): nn.MixtureTable | (4): nn.Dropout(0.5, busy) | (5): nn.Linear(2048 -> 1200) | (6): nn.Tanh | } | ... -> output | } | (2): nn.JoinTable | }
-> (3): nn.SelectTable(2)
... -> output
}
(5): nn.ConcatTable {
input
|-> (1): nn.Sequential { | [input -> (1) -> (2) -> output] | (1): nn.NarrowTable | (2): nn.CMulTable | }
-> (2): nn.SelectTable(3)
... -> output
}
(6): nn.SelectTable(1)
(7): nn.Dropout(0.5, busy)
(8): nn.Linear(2400 -> 2000)
}
shipped data function to cuda...
nParams= 51894822
decay_factor = 0.99997592083
learining rate: 0.0003
training loss: 4.8693726433563 on iter: 100/250000
training loss: 4.0755410186514 on iter: 200/250000
training loss: 4.7666955601846 on iter: 300/250000
training loss: 4.1725922354561 on iter: 400/250000
training loss: 3.7588104538761 on iter: 500/250000
training loss: 4.1201743807505 on iter: 600/250000
training loss: 3.95045189864 on iter: 700/250000
training loss: 3.8068482422712 on iter: 800/250000
training loss: 3.741670871492 on iter: 900/250000
training loss: 3.8267296812402 on iter: 1000/250000
training loss: 4.0479970796246 on iter: 1100/250000
training loss: 4.1455298619561 on iter: 1200/250000
training loss: 4.0713808883637 on iter: 1300/250000
training loss: 3.9584978528876 on iter: 1400/250000
training loss: 3.8829934939005 on iter: 1500/250000
training loss: 3.3697245319676 on iter: 1600/250000
training loss: 4.382460173037 on iter: 1700/250000
training loss: 3.5356175303366 on iter: 1800/250000
training loss: 4.1210829661098 on iter: 1900/250000
training loss: 3.2177009764106 on iter: 2000/250000
training loss: 3.9065231957854 on iter: 2100/250000
training loss: 4.1072556528939 on iter: 2200/250000
training loss: 4.0144710805711 on iter: 2300/250000
training loss: 3.7349430310903 on iter: 2400/250000
training loss: 3.1987197406927 on iter: 2500/250000
training loss: 3.8432964374996 on iter: 2600/250000
training loss: 3.7092187217176 on iter: 2700/250000
training loss: 3.5008688446065 on iter: 2800/250000
training loss: 4.6774473157799 on iter: 2900/250000
training loss: 4.1848083823068 on iter: 3000/250000
training loss: 3.5860899373431 on iter: 3100/250000
training loss: 2.9383213166334 on iter: 3200/250000
training loss: 3.0839858097203 on iter: 3300/250000
training loss: 3.7513172062108 on iter: 3400/250000
training loss: 3.1869710831699 on iter: 3500/250000
training loss: 3.624473524439 on iter: 3600/250000
training loss: 3.4050613941417 on iter: 3700/250000
training loss: 3.736372573513 on iter: 3800/250000
training loss: 3.2235076301945 on iter: 3900/250000
training loss: 3.5757938307821 on iter: 4000/250000
training loss: 3.3135292969846 on iter: 4100/250000
training loss: 2.6682984935174 on iter: 4200/250000
training loss: 3.7384815510971 on iter: 4300/250000
training loss: 3.4148914840276 on iter: 4400/250000
training loss: 3.6498642060847 on iter: 4500/250000
training loss: 3.6433645158231 on iter: 4600/250000
training loss: 3.7100133600744 on iter: 4700/250000
training loss: 2.8654981825852 on iter: 4800/250000
training loss: 3.217280009943 on iter: 4900/250000
training loss: 3.9207967604785 on iter: 5000/250000
training loss: 2.9555742654138 on iter: 5100/250000
training loss: 3.8598746884621 on iter: 5200/250000
training loss: 3.5847453037522 on iter: 5300/250000
training loss: 3.4657520898438 on iter: 5400/250000
training loss: 3.3916345924177 on iter: 5500/250000
training loss: 3.8579615059769 on iter: 5600/250000
training loss: 3.9038641790336 on iter: 5700/250000
training loss: 4.3864395398832 on iter: 5800/250000
training loss: 3.4928268161193 on iter: 5900/250000
training loss: 3.0450249645281 on iter: 6000/250000
training loss: 3.8842221132494 on iter: 6100/250000
training loss: 3.2880487401061 on iter: 6200/250000
training loss: 2.9185751451314 on iter: 6300/250000
training loss: 3.0740059996772 on iter: 6400/250000
training loss: 3.0148220994581 on iter: 6500/250000
training loss: 3.58458690508 on iter: 6600/250000
training loss: 3.2511303354174 on iter: 6700/250000
training loss: 2.971753350322 on iter: 6800/250000
training loss: 2.907227503617 on iter: 6900/250000
training loss: 2.7177727135474 on iter: 7000/250000
training loss: 3.4304250191041 on iter: 7100/250000
training loss: 3.9063394142783 on iter: 7200/250000
training loss: 3.5187849466658 on iter: 7300/250000
training loss: 3.2528221448543 on iter: 7400/250000
training loss: 2.7372836095832 on iter: 7500/250000
training loss: 3.6240937975306 on iter: 7600/250000
training loss: 3.1945240727705 on iter: 7700/250000
training loss: 2.9268224073938 on iter: 7800/250000
training loss: 3.5500002281539 on iter: 7900/250000
training loss: 2.7849560622056 on iter: 8000/250000
training loss: 2.8870666174859 on iter: 8100/250000
training loss: 3.5429703615977 on iter: 8200/250000
training loss: 3.3446548740357 on iter: 8300/250000
training loss: 3.7808576446444 on iter: 8400/250000
training loss: 2.8902860535067 on iter: 8500/250000
training loss: 3.9734762724497 on iter: 8600/250000
training loss: 3.4776270912936 on iter: 8700/250000
training loss: 2.9827175335349 on iter: 8800/250000
training loss: 3.699993023318 on iter: 8900/250000
training loss: 3.4213773127606 on iter: 9000/250000
training loss: 3.1046835439853 on iter: 9100/250000
training loss: 3.540458070569 on iter: 9200/250000
training loss: 3.2517778932949 on iter: 9300/250000
training loss: 3.2988247027024 on iter: 9400/250000
training loss: 2.710372349899 on iter: 9500/250000
training loss: 3.4521389221567 on iter: 9600/250000
training loss: 3.9458913853249 on iter: 9700/250000
training loss: 3.1495192185632 on iter: 9800/250000
training loss: 3.7107258838663 on iter: 9900/250000
training loss: 3.2706948892924 on iter: 10000/250000
training loss: 2.9775978381441 on iter: 10100/250000
training loss: 3.3944526472107 on iter: 10200/250000
training loss: 3.9131680883825 on iter: 10300/250000
training loss: 3.2496365447769 on iter: 10400/250000
training loss: 2.9414709037481 on iter: 10500/250000
training loss: 3.0687932799818 on iter: 10600/250000
training loss: 3.528188690943 on iter: 10700/250000
training loss: 3.2771711537859 on iter: 10800/250000
training loss: 3.1210666952226 on iter: 10900/250000
training loss: 3.35382299998 on iter: 11000/250000
training loss: 3.1272502926084 on iter: 11100/250000
training loss: 3.0141314469646 on iter: 11200/250000
training loss: 2.1994255576717 on iter: 11300/250000
training loss: 3.4505864127043 on iter: 11400/250000
training loss: 3.0549549939623 on iter: 11500/250000
training loss: 3.8655790047206 on iter: 11600/250000
training loss: 3.6181578718816 on iter: 11700/250000
training loss: 3.1614834674546 on iter: 11800/250000
training loss: 3.0445825487965 on iter: 11900/250000
training loss: 3.1182394490669 on iter: 12000/250000
training loss: 2.8396849742938 on iter: 12100/250000
training loss: 3.0825098473937 on iter: 12200/250000
training loss: 2.8717748775381 on iter: 12300/250000
training loss: 3.2389117511133 on iter: 12400/250000
training loss: 3.1350613412558 on iter: 12500/250000
training loss: 3.1720919024382 on iter: 12600/250000
training loss: 2.843639413933 on iter: 12700/250000
training loss: 3.2418760803871 on iter: 12800/250000
training loss: 3.0901308583901 on iter: 12900/250000
training loss: 3.2590009009524 on iter: 13000/250000
training loss: 2.7563787665588 on iter: 13100/250000
training loss: 3.0304985252942 on iter: 13200/250000
training loss: 2.7065686378805 on iter: 13300/250000
training loss: 3.3563562597361 on iter: 13400/250000
training loss: 2.9093297888355 on iter: 13500/250000
training loss: 2.9764109331449 on iter: 13600/250000
training loss: 2.9167921855305 on iter: 13700/250000
training loss: 3.6432441644402 on iter: 13800/250000
training loss: 2.5046723125632 on iter: 13900/250000
training loss: 3.0008588441858 on iter: 14000/250000
training loss: 3.011089685378 on iter: 14100/250000
training loss: 3.7472023018877 on iter: 14200/250000
training loss: 2.5692966813257 on iter: 14300/250000
training loss: 3.1719729910237 on iter: 14400/250000
training loss: 3.0317613558285 on iter: 14500/250000
training loss: 2.7482626584263 on iter: 14600/250000
training loss: 2.9293070169308 on iter: 14700/250000
training loss: 3.3509049303604 on iter: 14800/250000
training loss: 3.4324509253248 on iter: 14900/250000
training loss: 2.7452546194139 on iter: 15000/250000
training loss: 3.2008311045955 on iter: 15100/250000
training loss: 3.219244336545 on iter: 15200/250000
training loss: 3.2872278345276 on iter: 15300/250000
training loss: 3.2195236941911 on iter: 15400/250000
training loss: 3.2281098853581 on iter: 15500/250000
training loss: 3.4251679423988 on iter: 15600/250000
training loss: 2.830743881681 on iter: 15700/250000
training loss: 3.4418592989672 on iter: 15800/250000
training loss: 2.9570411896901 on iter: 15900/250000
training loss: 2.9305634234001 on iter: 16000/250000
training loss: 3.2423435593587 on iter: 16100/250000
training loss: 2.6633830285309 on iter: 16200/250000
training loss: 3.3764463189316 on iter: 16300/250000
training loss: 3.1167515878543 on iter: 16400/250000
training loss: 2.9797131342018 on iter: 16500/250000
training loss: 3.3640001957979 on iter: 16600/250000
training loss: 2.9842229140313 on iter: 16700/250000
training loss: 2.9349141462539 on iter: 16800/250000
training loss: 2.9421552038215 on iter: 16900/250000
training loss: 3.1588174941052 on iter: 17000/250000
training loss: 3.1636286831675 on iter: 17100/250000
training loss: 3.3752112876663 on iter: 17200/250000
training loss: 2.7049184432757 on iter: 17300/250000
training loss: 3.3542719029892 on iter: 17400/250000
training loss: 3.2876799176369 on iter: 17500/250000
training loss: 3.1450917912788 on iter: 17600/250000
training loss: 3.2181256066894 on iter: 17700/250000
training loss: 3.0196555049743 on iter: 17800/250000
training loss: 3.5895030486964 on iter: 17900/250000
training loss: 3.1119935808024 on iter: 18000/250000
training loss: 3.6554815055515 on iter: 18100/250000
training loss: 3.0559010261442 on iter: 18200/250000
training loss: 3.1836163580029 on iter: 18300/250000
training loss: 2.8963457020751 on iter: 18400/250000
training loss: 3.1694013043862 on iter: 18500/250000
training loss: 3.5502405222109 on iter: 18600/250000
training loss: 2.6263777880011 on iter: 18700/250000
training loss: 2.6189622432943 on iter: 18800/250000
training loss: 3.0212086331832 on iter: 18900/250000
training loss: 3.1764373664321 on iter: 19000/250000
training loss: 3.3414628431327 on iter: 19100/250000
training loss: 3.2404355441707 on iter: 19200/250000
training loss: 3.2293816091282 on iter: 19300/250000
training loss: 3.2193897891727 on iter: 19400/250000
training loss: 3.0912890248628 on iter: 19500/250000
training loss: 3.4781617757879 on iter: 19600/250000
training loss: 3.3243259346857 on iter: 19700/250000
training loss: 2.5715496291489 on iter: 19800/250000
training loss: 3.4017144645094 on iter: 19900/250000
training loss: 2.8807517725605 on iter: 20000/250000
training loss: 3.3015500497401 on iter: 20100/250000
training loss: 2.6950072619071 on iter: 20200/250000
training loss: 2.8330123886692 on iter: 20300/250000
training loss: 3.4885240178146 on iter: 20400/250000
training loss: 3.2769468335815 on iter: 20500/250000
training loss: 3.1374565907301 on iter: 20600/250000
training loss: 3.7980738535804 on iter: 20700/250000
training loss: 3.3538743955902 on iter: 20800/250000
training loss: 3.2575119784081 on iter: 20900/250000
training loss: 2.8834165686189 on iter: 21000/250000
training loss: 3.4384198095179 on iter: 21100/250000
training loss: 3.2579521933942 on iter: 21200/250000
training loss: 3.2417392824251 on iter: 21300/250000
training loss: 3.3312448621918 on iter: 21400/250000
training loss: 2.990379253637 on iter: 21500/250000
training loss: 3.4543052471929 on iter: 21600/250000
training loss: 2.6592379707315 on iter: 21700/250000
training loss: 3.1453652803984 on iter: 21800/250000
training loss: 2.7971247188701 on iter: 21900/250000
training loss: 2.8413442890739 on iter: 22000/250000
training loss: 2.5744051844803 on iter: 22100/250000
training loss: 2.5189899570808 on iter: 22200/250000
training loss: 2.6232365677646 on iter: 22300/250000
training loss: 3.6171616964174 on iter: 22400/250000
training loss: 2.6497101692725 on iter: 22500/250000
training loss: 3.2365120220182 on iter: 22600/250000
training loss: 3.6583070155128 on iter: 22700/250000
training loss: 3.1400776131364 on iter: 22800/250000
training loss: 3.4458716561465 on iter: 22900/250000
training loss: 2.9104094067861 on iter: 23000/250000
training loss: 2.9104067385251 on iter: 23100/250000
training loss: 3.2661380485108 on iter: 23200/250000
training loss: 2.6968539340268 on iter: 23300/250000
training loss: 3.2607974878456 on iter: 23400/250000
training loss: 3.2548830323751 on iter: 23500/250000
training loss: 2.6086769734632 on iter: 23600/250000
training loss: 2.6429898930076 on iter: 23700/250000
training loss: 3.533971672395 on iter: 23800/250000
training loss: 2.5116789161084 on iter: 23900/250000
training loss: 3.0228113237291 on iter: 24000/250000
training loss: 2.8585392572753 on iter: 24100/250000
training loss: 3.0625396872499 on iter: 24200/250000
training loss: 2.6879629439801 on iter: 24300/250000
training loss: 2.5624721268646 on iter: 24400/250000
training loss: 3.0780126949095 on iter: 24500/250000
training loss: 2.8161702155686 on iter: 24600/250000
training loss: 2.7655067865491 on iter: 24700/250000
training loss: 3.1447250298949 on iter: 24800/250000
training loss: 2.6643519266675 on iter: 24900/250000
training loss: 3.1405527531231 on iter: 25000/250000
training loss: 2.8240664630491 on iter: 25100/250000
training loss: 3.2591791792974 on iter: 25200/250000
training loss: 3.0373734209679 on iter: 25300/250000
training loss: 3.2993201249172 on iter: 25400/250000
training loss: 2.9241576340796 on iter: 25500/250000
training loss: 3.4208655103621 on iter: 25600/250000
training loss: 2.6986180118537 on iter: 25700/250000
training loss: 3.1083747618008 on iter: 25800/250000
training loss: 3.0968884944888 on iter: 25900/250000
training loss: 2.5730109123527 on iter: 26000/250000
training loss: 2.9652333260898 on iter: 26100/250000
training loss: 3.106467940841 on iter: 26200/250000
training loss: 2.889385309368 on iter: 26300/250000
training loss: 2.8754751483936 on iter: 26400/250000
training loss: 3.4690856744276 on iter: 26500/250000
training loss: 2.6626066537817 on iter: 26600/250000
training loss: 3.693138713736 on iter: 26700/250000
training loss: 3.3655539838607 on iter: 26800/250000
training loss: 2.4135486124885 on iter: 26900/250000
training loss: 2.7263093759357 on iter: 27000/250000
training loss: 2.7154404199844 on iter: 27100/250000
training loss: 2.551790252989 on iter: 27200/250000
training loss: 2.7817980669283 on iter: 27300/250000
training loss: 3.032612377084 on iter: 27400/250000
training loss: 2.8645624813285 on iter: 27500/250000
training loss: 2.3777953999209 on iter: 27600/250000
training loss: 2.9842303171256 on iter: 27700/250000
training loss: 2.3908186542124 on iter: 27800/250000
training loss: 2.6519880274652 on iter: 27900/250000
training loss: 3.2047579907545 on iter: 28000/250000
training loss: 2.9475471151639 on iter: 28100/250000
training loss: 2.5323053093292 on iter: 28200/250000
training loss: 2.7037444801731 on iter: 28300/250000
training loss: 3.7067378552588 on iter: 28400/250000
training loss: 3.0086043325106 on iter: 28500/250000
training loss: 2.8878485085798 on iter: 28600/250000
training loss: 2.9867025578358 on iter: 28700/250000
training loss: 2.8323197593272 on iter: 28800/250000
training loss: 2.5924800263649 on iter: 28900/250000
training loss: 3.1707279862244 on iter: 29000/250000
training loss: 2.8470747701881 on iter: 29100/250000
training loss: 2.9111539048573 on iter: 29200/250000
training loss: 2.2627088660232 on iter: 29300/250000
training loss: 2.5571346068465 on iter: 29400/250000
training loss: 2.6449880883746 on iter: 29500/250000
training loss: 3.1094094366272 on iter: 29600/250000
training loss: 3.0538276190098 on iter: 29700/250000
training loss: 3.2142187911395 on iter: 29800/250000
training loss: 2.6269572943713 on iter: 29900/250000
training loss: 2.7888394134094 on iter: 30000/250000
training loss: 3.3249958810677 on iter: 30100/250000
training loss: 2.3240902474375 on iter: 30200/250000
training loss: 3.3579108847261 on iter: 30300/250000
training loss: 3.2161314850033 on iter: 30400/250000
training loss: 3.3103784601952 on iter: 30500/250000
training loss: 3.2584265875949 on iter: 30600/250000
training loss: 3.0295679705164 on iter: 30700/250000
training loss: 2.5405019778261 on iter: 30800/250000
training loss: 2.8883628275649 on iter: 30900/250000
training loss: 3.156979906038 on iter: 31000/250000
training loss: 3.3243593159314 on iter: 31100/250000
training loss: 2.9221167958726 on iter: 31200/250000
training loss: 2.4022412857158 on iter: 31300/250000
training loss: 2.4823977543017 on iter: 31400/250000
training loss: 3.1635561593958 on iter: 31500/250000
training loss: 2.7867395250907 on iter: 31600/250000
training loss: 2.8212026335174 on iter: 31700/250000
training loss: 2.5491165642515 on iter: 31800/250000
training loss: 1.9745845448584 on iter: 31900/250000
training loss: 2.5993409826297 on iter: 32000/250000
training loss: 3.002586099949 on iter: 32100/250000
training loss: 2.4849100482274 on iter: 32200/250000
training loss: 2.9782041320016 on iter: 32300/250000
training loss: 2.8248138720943 on iter: 32400/250000
training loss: 3.3294764278592 on iter: 32500/250000
training loss: 3.3369331493405 on iter: 32600/250000
training loss: 2.7598539820267 on iter: 32700/250000
training loss: 3.3208098564582 on iter: 32800/250000
training loss: 3.2498726176156 on iter: 32900/250000
training loss: 2.8864562423817 on iter: 33000/250000
training loss: 2.5959523809067 on iter: 33100/250000
training loss: 2.9920279189924 on iter: 33200/250000
training loss: 3.4556683073173 on iter: 33300/250000
training loss: 3.028779018419 on iter: 33400/250000
training loss: 2.7479328880365 on iter: 33500/250000
training loss: 3.407427632183 on iter: 33600/250000
training loss: 3.0322574660951 on iter: 33700/250000
training loss: 2.8570670392355 on iter: 33800/250000
training loss: 3.2030911648047 on iter: 33900/250000
training loss: 3.0942246064616 on iter: 34000/250000
training loss: 3.1912913184401 on iter: 34100/250000
training loss: 2.8534122352894 on iter: 34200/250000
training loss: 2.7972144633762 on iter: 34300/250000
training loss: 3.6274471583914 on iter: 34400/250000
training loss: 3.0430058132834 on iter: 34500/250000
training loss: 3.2025162238992 on iter: 34600/250000
training loss: 2.7894221746726 on iter: 34700/250000
training loss: 3.1981873895599 on iter: 34800/250000
training loss: 2.6263389024415 on iter: 34900/250000
training loss: 3.346883854562 on iter: 35000/250000
training loss: 2.7505133972234 on iter: 35100/250000
training loss: 2.6575271302295 on iter: 35200/250000
training loss: 2.1933454271242 on iter: 35300/250000
training loss: 2.2448134552074 on iter: 35400/250000
training loss: 2.4995778288307 on iter: 35500/250000
training loss: 3.7240212729168 on iter: 35600/250000
training loss: 3.1065201740652 on iter: 35700/250000
training loss: 2.9089732985142 on iter: 35800/250000
training loss: 2.1044293994932 on iter: 35900/250000
training loss: 2.2188602078087 on iter: 36000/250000
training loss: 3.1015161834776 on iter: 36100/250000
training loss: 3.6836881263824 on iter: 36200/250000
training loss: 2.7493086099768 on iter: 36300/250000
training loss: 2.5318635495157 on iter: 36400/250000
training loss: 3.01264194735 on iter: 36500/250000
training loss: 2.951277122908 on iter: 36600/250000
training loss: 2.5653159018485 on iter: 36700/250000
training loss: 2.997954181701 on iter: 36800/250000
training loss: 3.3420850794446 on iter: 36900/250000
training loss: 2.7062056144825 on iter: 37000/250000
training loss: 2.6751346963219 on iter: 37100/250000
training loss: 3.4851508607656 on iter: 37200/250000
training loss: 3.324443849895 on iter: 37300/250000
training loss: 2.7772340143259 on iter: 37400/250000
training loss: 2.8627652276409 on iter: 37500/250000
training loss: 3.1176290939467 on iter: 37600/250000
training loss: 2.8095328415567 on iter: 37700/250000
training loss: 2.4254056645494 on iter: 37800/250000
training loss: 2.4594393509266 on iter: 37900/250000
training loss: 3.3197724617349 on iter: 38000/250000
training loss: 2.895093081814 on iter: 38100/250000
training loss: 2.7091126372623 on iter: 38200/250000
training loss: 2.9800662538717 on iter: 38300/250000
training loss: 3.1934758233596 on iter: 38400/250000
training loss: 2.9273412411394 on iter: 38500/250000
training loss: 2.7778064830738 on iter: 38600/250000
training loss: 2.9012858606713 on iter: 38700/250000
training loss: 3.166392873428 on iter: 38800/250000
training loss: 2.1570321095881 on iter: 38900/250000
training loss: 2.5526920155335 on iter: 39000/250000
training loss: 2.8208426708314 on iter: 39100/250000
training loss: 3.2052770069977 on iter: 39200/250000
training loss: 3.3753125780917 on iter: 39300/250000
training loss: 3.158969717612 on iter: 39400/250000
training loss: 2.6212700302726 on iter: 39500/250000
training loss: 3.1128000664981 on iter: 39600/250000
training loss: 3.10789489616 on iter: 39700/250000
training loss: 2.8534272416057 on iter: 39800/250000
training loss: 2.7241283894475 on iter: 39900/250000
training loss: 2.1668711832415 on iter: 40000/250000
training loss: 2.7931482708619 on iter: 40100/250000
training loss: 2.5422958466249 on iter: 40200/250000
training loss: 2.7831317845305 on iter: 40300/250000
training loss: 3.2608394385049 on iter: 40400/250000
training loss: 3.1551828620188 on iter: 40500/250000
training loss: 2.7071851100367 on iter: 40600/250000
training loss: 2.6449025099827 on iter: 40700/250000
training loss: 2.9257500981731 on iter: 40800/250000
training loss: 2.8167223081033 on iter: 40900/250000
training loss: 2.5398481681439 on iter: 41000/250000
training loss: 3.3109904565936 on iter: 41100/250000
training loss: 3.0244976672865 on iter: 41200/250000
training loss: 3.2261034304039 on iter: 41300/250000
training loss: 3.3421393852847 on iter: 41400/250000
training loss: 2.658968246898 on iter: 41500/250000
training loss: 2.368541228227 on iter: 41600/250000
training loss: 3.0218014445214 on iter: 41700/250000
training loss: 3.1160808140738 on iter: 41800/250000
training loss: 2.6360564566911 on iter: 41900/250000
training loss: 2.3966052225207 on iter: 42000/250000
training loss: 3.3291244522373 on iter: 42100/250000
training loss: 3.2903659136894 on iter: 42200/250000
training loss: 2.7074324931238 on iter: 42300/250000
training loss: 2.9263368820822 on iter: 42400/250000
training loss: 3.0633604169517 on iter: 42500/250000
training loss: 3.2471629710388 on iter: 42600/250000
training loss: 2.5533603983519 on iter: 42700/250000
training loss: 2.616530287249 on iter: 42800/250000
training loss: 3.6041385093687 on iter: 42900/250000
training loss: 3.3307375198414 on iter: 43000/250000
training loss: 3.4292122592299 on iter: 43100/250000
training loss: 3.4907922405168 on iter: 43200/250000
training loss: 3.0338559101893 on iter: 43300/250000
training loss: 3.0653447333173 on iter: 43400/250000
training loss: 3.1978451693024 on iter: 43500/250000
training loss: 3.0900992756311 on iter: 43600/250000
training loss: 3.3162743509432 on iter: 43700/250000
training loss: 3.5071853172012 on iter: 43800/250000
training loss: 2.4317395999511 on iter: 43900/250000
training loss: 3.2084076053427 on iter: 44000/250000
training loss: 3.3483859018456 on iter: 44100/250000
training loss: 2.5174978677158 on iter: 44200/250000
training loss: 2.9167060854518 on iter: 44300/250000
training loss: 2.7845944168198 on iter: 44400/250000
training loss: 3.1594244152385 on iter: 44500/250000
training loss: 2.983772731962 on iter: 44600/250000
training loss: 2.5752059344433 on iter: 44700/250000
training loss: 2.5584736965249 on iter: 44800/250000
training loss: 2.6382590652339 on iter: 44900/250000
training loss: 2.7907793728873 on iter: 45000/250000
training loss: 2.6300144369619 on iter: 45100/250000
training loss: 2.407163148518 on iter: 45200/250000
training loss: 2.8119192752524 on iter: 45300/250000
training loss: 3.1512034075037 on iter: 45400/250000
training loss: 3.1743895321618 on iter: 45500/250000
training loss: 3.5992599159344 on iter: 45600/250000
training loss: 2.8110358833605 on iter: 45700/250000
training loss: 2.5396199482666 on iter: 45800/250000
training loss: 3.2330021439869 on iter: 45900/250000
training loss: 3.1042458884371 on iter: 46000/250000
training loss: 3.2680115354797 on iter: 46100/250000
training loss: 3.0620634476616 on iter: 46200/250000
training loss: 2.5693489043521 on iter: 46300/250000
training loss: 3.3221335422228 on iter: 46400/250000
training loss: 2.8385716497321 on iter: 46500/250000
training loss: 3.3701882734479 on iter: 46600/250000
training loss: 1.8712476351257 on iter: 46700/250000
training loss: 3.3493705605856 on iter: 46800/250000
training loss: 2.4429897726369 on iter: 46900/250000
training loss: 2.6455201845129 on iter: 47000/250000
training loss: 3.2966350328845 on iter: 47100/250000
training loss: 2.9383092031553 on iter: 47200/250000
training loss: 2.5062020613761 on iter: 47300/250000
training loss: 3.0695962221334 on iter: 47400/250000
training loss: 3.3122460379691 on iter: 47500/250000
training loss: 2.9600613574456 on iter: 47600/250000
training loss: 3.3094171036278 on iter: 47700/250000
training loss: 2.7381019660127 on iter: 47800/250000
training loss: 3.0401055938782 on iter: 47900/250000
training loss: 2.3997073984432 on iter: 48000/250000
training loss: 2.6164732820399 on iter: 48100/250000
training loss: 3.2026317104463 on iter: 48200/250000
training loss: 2.5061495752986 on iter: 48300/250000
training loss: 2.7231252243719 on iter: 48400/250000
training loss: 3.2493738675573 on iter: 48500/250000
training loss: 2.4462431523864 on iter: 48600/250000
training loss: 2.9212372545115 on iter: 48700/250000
training loss: 2.9847969553022 on iter: 48800/250000
training loss: 2.9501151660417 on iter: 48900/250000
training loss: 2.7569659016263 on iter: 49000/250000
training loss: 2.3480913740908 on iter: 49100/250000
training loss: 3.0163933541744 on iter: 49200/250000
training loss: 2.7167207869726 on iter: 49300/250000
training loss: 3.2069934467285 on iter: 49400/250000
training loss: 2.7863701021643 on iter: 49500/250000
training loss: 2.7587296057581 on iter: 49600/250000
training loss: 2.6402283877969 on iter: 49700/250000
training loss: 3.156670467286 on iter: 49800/250000
training loss: 3.1014260299928 on iter: 49900/250000
training loss: 3.0707723669787 on iter: 50000/250000
learining rate: 0.00018000429996178
training loss: 2.9826951151979 on iter: 50100/250000
training loss: 2.8695361532492 on iter: 50200/250000
training loss: 3.3247373019382 on iter: 50300/250000
training loss: 3.1328238853215 on iter: 50400/250000
training loss: 2.5897246550552 on iter: 50500/250000
training loss: 3.7448340745144 on iter: 50600/250000
training loss: 2.8824503279683 on iter: 50700/250000
training loss: 2.6859176795303 on iter: 50800/250000
training loss: 2.6547332435929 on iter: 50900/250000
training loss: 3.0189399575269 on iter: 51000/250000
training loss: 3.1493902044079 on iter: 51100/250000
training loss: 3.0375422919537 on iter: 51200/250000
training loss: 2.6195081558781 on iter: 51300/250000
training loss: 2.8358699097512 on iter: 51400/250000
training loss: 2.7570889452629 on iter: 51500/250000
training loss: 3.478781002872 on iter: 51600/250000
training loss: 2.6494430457571 on iter: 51700/250000
training loss: 3.1594810498538 on iter: 51800/250000
training loss: 2.3492261993111 on iter: 51900/250000
training loss: 2.9185940559061 on iter: 52000/250000
training loss: 4.0521185699998 on iter: 52100/250000
training loss: 3.4516031220764 on iter: 52200/250000
training loss: 2.8505002632043 on iter: 52300/250000
training loss: 2.7739651482366 on iter: 52400/250000
training loss: 2.5244510753311 on iter: 52500/250000
training loss: 2.7624893480496 on iter: 52600/250000
training loss: 2.9913573142979 on iter: 52700/250000
training loss: 2.4793100606248 on iter: 52800/250000
training loss: 3.0336866261159 on iter: 52900/250000
training loss: 3.404714977709 on iter: 53000/250000
training loss: 3.0290627516123 on iter: 53100/250000
training loss: 3.1928037740917 on iter: 53200/250000
training loss: 3.2476352743402 on iter: 53300/250000
training loss: 2.6967785476726 on iter: 53400/250000
training loss: 2.4209349035244 on iter: 53500/250000
training loss: 2.8452198188297 on iter: 53600/250000
training loss: 2.4684724386732 on iter: 53700/250000
training loss: 3.5310061366924 on iter: 53800/250000
training loss: 2.7964644465727 on iter: 53900/250000
training loss: 3.4393585568651 on iter: 54000/250000
training loss: 3.2504298062017 on iter: 54100/250000
training loss: 3.0453870621538 on iter: 54200/250000
training loss: 2.5750210401611 on iter: 54300/250000
training loss: 3.1335563119142 on iter: 54400/250000
training loss: 3.2272489492582 on iter: 54500/250000
training loss: 2.6860665339941 on iter: 54600/250000
training loss: 2.8455795470557 on iter: 54700/250000
training loss: 2.8050024987435 on iter: 54800/250000
training loss: 2.4481127971162 on iter: 54900/250000
training loss: 3.1897213816638 on iter: 55000/250000
training loss: 2.6587693715 on iter: 55100/250000
training loss: 2.605828933971 on iter: 55200/250000
training loss: 3.1640614801221 on iter: 55300/250000
training loss: 2.8783736682168 on iter: 55400/250000
training loss: 2.6427198870244 on iter: 55500/250000
training loss: 3.1458731234634 on iter: 55600/250000
training loss: 3.2361596267457 on iter: 55700/250000
training loss: 2.6743929343205 on iter: 55800/250000
training loss: 3.2731817680049 on iter: 55900/250000
training loss: 2.8112355243631 on iter: 56000/250000
training loss: 3.568056621572 on iter: 56100/250000
training loss: 2.6502633687674 on iter: 56200/250000
training loss: 2.9544126035189 on iter: 56300/250000
training loss: 3.6956327694954 on iter: 56400/250000
training loss: 2.537273512442 on iter: 56500/250000
training loss: 2.3139102896056 on iter: 56600/250000
training loss: 2.8545538928872 on iter: 56700/250000
training loss: 2.5670895218462 on iter: 56800/250000
training loss: 2.8794426485755 on iter: 56900/250000
training loss: 3.2846840091768 on iter: 57000/250000
training loss: 3.0623465544199 on iter: 57100/250000
training loss: 3.67607296062 on iter: 57200/250000
training loss: 3.3622431219602 on iter: 57300/250000
training loss: 2.8135809325525 on iter: 57400/250000
training loss: 2.5007720046486 on iter: 57500/250000
training loss: 3.0946373751104 on iter: 57600/250000
training loss: 3.2585399239142 on iter: 57700/250000
training loss: 2.8307698715608 on iter: 57800/250000
training loss: 3.1701312908077 on iter: 57900/250000
training loss: 2.7599415519429 on iter: 58000/250000
training loss: 3.3744652260144 on iter: 58100/250000
training loss: 3.4872125350232 on iter: 58200/250000
training loss: 3.0166415978538 on iter: 58300/250000
training loss: 3.0133274318937 on iter: 58400/250000
training loss: 1.9453353745444 on iter: 58500/250000
training loss: 3.0910007167371 on iter: 58600/250000
training loss: 3.2239646667555 on iter: 58700/250000
training loss: 2.9830185482923 on iter: 58800/250000
training loss: 3.0780200110967 on iter: 58900/250000
training loss: 2.5085090241504 on iter: 59000/250000
training loss: 3.029117557161 on iter: 59100/250000
training loss: 2.8099250720308 on iter: 59200/250000
training loss: 2.740815904032 on iter: 59300/250000
training loss: 3.4930301800565 on iter: 59400/250000
training loss: 3.1724187769694 on iter: 59500/250000
training loss: 3.0695920405288 on iter: 59600/250000
training loss: 2.9850761357121 on iter: 59700/250000
training loss: 2.9848018521725 on iter: 59800/250000
training loss: 2.639988137982 on iter: 59900/250000
training loss: 3.0181104285322 on iter: 60000/250000
training loss: 2.7710719248999 on iter: 60100/250000
training loss: 3.3530960537512 on iter: 60200/250000
training loss: 3.3115733933445 on iter: 60300/250000
training loss: 2.7605678467803 on iter: 60400/250000
training loss: 2.6251820118973 on iter: 60500/250000
training loss: 2.8969516717712 on iter: 60600/250000
training loss: 3.5016405804133 on iter: 60700/250000
training loss: 2.5762276602576 on iter: 60800/250000
training loss: 2.5986684338719 on iter: 60900/250000
training loss: 3.3459954258255 on iter: 61000/250000
training loss: 3.0472167682526 on iter: 61100/250000
training loss: 3.3193574147296 on iter: 61200/250000
training loss: 3.0682299213191 on iter: 61300/250000
training loss: 2.9830966701049 on iter: 61400/250000
training loss: 3.1305542254637 on iter: 61500/250000
training loss: 2.7577775071449 on iter: 61600/250000
training loss: 2.6724094615035 on iter: 61700/250000
training loss: 2.4385089079234 on iter: 61800/250000
training loss: 2.5126660310357 on iter: 61900/250000
training loss: 2.543611411866 on iter: 62000/250000
training loss: 3.3100760090423 on iter: 62100/250000
training loss: 2.7157881223243 on iter: 62200/250000
training loss: 3.0054993503107 on iter: 62300/250000
training loss: 2.9120837234232 on iter: 62400/250000
training loss: 2.7678028996828 on iter: 62500/250000
training loss: 3.2582168669495 on iter: 62600/250000
training loss: 2.7085406638655 on iter: 62700/250000
training loss: 3.5781126167857 on iter: 62800/250000
training loss: 3.0884174232502 on iter: 62900/250000
training loss: 2.8362335311594 on iter: 63000/250000
training loss: 2.732094537556 on iter: 63100/250000
training loss: 3.181310105201 on iter: 63200/250000
training loss: 2.9871917627265 on iter: 63300/250000
training loss: 2.94408443431 on iter: 63400/250000
training loss: 3.0737079111913 on iter: 63500/250000
training loss: 2.6107184414519 on iter: 63600/250000
training loss: 3.1083229924476 on iter: 63700/250000
training loss: 3.1784012872175 on iter: 63800/250000
training loss: 3.533237610381 on iter: 63900/250000
training loss: 2.7514176690915 on iter: 64000/250000
training loss: 3.0975480749125 on iter: 64100/250000
training loss: 2.7239186309335 on iter: 64200/250000
training loss: 2.2746646783646 on iter: 64300/250000
training loss: 2.9048258886354 on iter: 64400/250000
training loss: 3.0920490435751 on iter: 64500/250000
training loss: 3.878257223996 on iter: 64600/250000
training loss: 3.2526426176853 on iter: 64700/250000
training loss: 3.1041054207025 on iter: 64800/250000
training loss: 3.5939834330549 on iter: 64900/250000
training loss: 3.4393936402193 on iter: 65000/250000
training loss: 2.6217946175344 on iter: 65100/250000
training loss: 2.9861897746874 on iter: 65200/250000
training loss: 3.0961898524645 on iter: 65300/250000
training loss: 3.1137210161858 on iter: 65400/250000
training loss: 3.5197989557816 on iter: 65500/250000
training loss: 3.4491556138872 on iter: 65600/250000
training loss: 2.656666503578 on iter: 65700/250000
training loss: 2.9061804827635 on iter: 65800/250000
training loss: 3.2065071756976 on iter: 65900/250000
training loss: 3.2079962530131 on iter: 66000/250000
training loss: 3.6546202090676 on iter: 66100/250000
training loss: 2.8584147946144 on iter: 66200/250000
training loss: 3.3446772832008 on iter: 66300/250000
training loss: 2.7453075451002 on iter: 66400/250000
training loss: 2.2535545342383 on iter: 66500/250000
training loss: 2.5837350295701 on iter: 66600/250000
training loss: 2.7692851896589 on iter: 66700/250000
training loss: 3.5189569248823 on iter: 66800/250000
training loss: 3.2954797731508 on iter: 66900/250000
training loss: 2.6930909740962 on iter: 67000/250000
training loss: 3.0156785090048 on iter: 67100/250000
training loss: 2.5555769150017 on iter: 67200/250000
training loss: 2.7921351404762 on iter: 67300/250000
training loss: 2.9198624847713 on iter: 67400/250000
training loss: 2.8635953785652 on iter: 67500/250000
training loss: 2.5908949100521 on iter: 67600/250000
training loss: 2.6632553410762 on iter: 67700/250000
training loss: 3.2292279717964 on iter: 67800/250000
training loss: 3.4049451697408 on iter: 67900/250000
training loss: 3.4313322335933 on iter: 68000/250000
training loss: 2.837024118665 on iter: 68100/250000
training loss: 3.7276361556984 on iter: 68200/250000
training loss: 3.1352561948687 on iter: 68300/250000
training loss: 3.4846995113593 on iter: 68400/250000
training loss: 3.1400407939217 on iter: 68500/250000
training loss: 2.814141848301 on iter: 68600/250000
training loss: 3.3923237151852 on iter: 68700/250000
training loss: 2.5814441601296 on iter: 68800/250000
training loss: 2.6218387099068 on iter: 68900/250000
training loss: 2.4842013652746 on iter: 69000/250000
training loss: 3.0806022973119 on iter: 69100/250000
training loss: 3.0246612504796 on iter: 69200/250000
training loss: 3.2521774912744 on iter: 69300/250000
training loss: 3.0504192904689 on iter: 69400/250000
training loss: 3.1450432059011 on iter: 69500/250000
training loss: 3.4340892953938 on iter: 69600/250000
training loss: 3.1643869403745 on iter: 69700/250000
training loss: 2.9443093896721 on iter: 69800/250000
training loss: 3.1639822257506 on iter: 69900/250000
training loss: 3.5239903307213 on iter: 70000/250000
training loss: 3.1683805187573 on iter: 70100/250000
training loss: 3.2738489613018 on iter: 70200/250000
training loss: 3.3061647799341 on iter: 70300/250000
training loss: 2.3301140115698 on iter: 70400/250000
training loss: 2.7353810700005 on iter: 70500/250000
training loss: 2.9502778672276 on iter: 70600/250000
training loss: 2.9377032426871 on iter: 70700/250000
training loss: 2.9268029428606 on iter: 70800/250000
training loss: 3.4432488077333 on iter: 70900/250000
training loss: 2.9698552179863 on iter: 71000/250000
training loss: 2.8019616871898 on iter: 71100/250000
training loss: 3.0236223960961 on iter: 71200/250000
training loss: 2.7803275072981 on iter: 71300/250000
training loss: 3.148320343629 on iter: 71400/250000
training loss: 3.1084766539006 on iter: 71500/250000
training loss: 3.15534455513 on iter: 71600/250000
training loss: 2.5522471391442 on iter: 71700/250000
training loss: 2.8775859991267 on iter: 71800/250000
training loss: 2.8590626964966 on iter: 71900/250000
training loss: 2.3074517110586 on iter: 72000/250000
training loss: 3.0126312298921 on iter: 72100/250000
training loss: 2.7601602534217 on iter: 72200/250000
training loss: 2.7035118035199 on iter: 72300/250000
training loss: 2.7017113795799 on iter: 72400/250000
training loss: 3.1857272839073 on iter: 72500/250000
training loss: 3.8198328028016 on iter: 72600/250000
training loss: 2.6088185608655 on iter: 72700/250000
training loss: 3.3609472960673 on iter: 72800/250000
training loss: 2.9430997383024 on iter: 72900/250000
training loss: 3.5883307679762 on iter: 73000/250000
training loss: 2.5461844835487 on iter: 73100/250000
training loss: 2.8102957137863 on iter: 73200/250000
training loss: 3.151489498988 on iter: 73300/250000
training loss: 2.9181542741557 on iter: 73400/250000
training loss: 2.7003927641265 on iter: 73500/250000
training loss: 3.0682985007124 on iter: 73600/250000
training loss: 3.8631146414241 on iter: 73700/250000
training loss: 3.0694221511307 on iter: 73800/250000
training loss: 3.7196090830349 on iter: 73900/250000
training loss: 3.5014825697785 on iter: 74000/250000
training loss: 3.586143173477 on iter: 74100/250000
training loss: 3.3340739721714 on iter: 74200/250000
training loss: 2.4551394150771 on iter: 74300/250000
training loss: 3.3608672482843 on iter: 74400/250000
training loss: 3.0450554453591 on iter: 74500/250000
training loss: 2.8018784310776 on iter: 74600/250000
training loss: 2.6398486372899 on iter: 74700/250000
training loss: 3.1180912152079 on iter: 74800/250000
training loss: 2.7976385471867 on iter: 74900/250000
training loss: 2.7986637907159 on iter: 75000/250000
training loss: 3.0670404750322 on iter: 75100/250000
training loss: 2.5600503187674 on iter: 75200/250000
training loss: 2.7130951573335 on iter: 75300/250000
training loss: 2.6623422271355 on iter: 75400/250000
training loss: 2.8298775052059 on iter: 75500/250000
training loss: 2.3036252644213 on iter: 75600/250000
training loss: 2.9201704123843 on iter: 75700/250000
training loss: 3.2203663624175 on iter: 75800/250000
training loss: 3.7863838160382 on iter: 75900/250000
training loss: 2.9113050473782 on iter: 76000/250000
training loss: 2.9761322221183 on iter: 76100/250000
training loss: 3.8269345676029 on iter: 76200/250000
training loss: 3.4922256275946 on iter: 76300/250000
training loss: 2.7214598481621 on iter: 76400/250000
training loss: 3.4959436263141 on iter: 76500/250000
training loss: 2.8614734132119 on iter: 76600/250000
training loss: 2.9568758802305 on iter: 76700/250000
training loss: 3.105182718802 on iter: 76800/250000
training loss: 2.7582200695045 on iter: 76900/250000
training loss: 2.9178767757736 on iter: 77000/250000
training loss: 3.1624818800075 on iter: 77100/250000
training loss: 3.3224185661786 on iter: 77200/250000
training loss: 3.0615292418194 on iter: 77300/250000
training loss: 2.9605002521179 on iter: 77400/250000
training loss: 3.300348118564 on iter: 77500/250000
training loss: 3.4672443304345 on iter: 77600/250000
training loss: 3.1136209081618 on iter: 77700/250000
training loss: 3.4035072276341 on iter: 77800/250000
training loss: 2.4240017831044 on iter: 77900/250000
training loss: 2.4582549347919 on iter: 78000/250000
training loss: 3.4908742560878 on iter: 78100/250000
training loss: 2.683143147409 on iter: 78200/250000
training loss: 2.6585905098064 on iter: 78300/250000
training loss: 2.6069290228319 on iter: 78400/250000
training loss: 4.0959419035624 on iter: 78500/250000
training loss: 3.5338250957031 on iter: 78600/250000
training loss: 3.3091526291158 on iter: 78700/250000
training loss: 2.9204612692259 on iter: 78800/250000
training loss: 3.1874143213861 on iter: 78900/250000
training loss: 3.2462258256699 on iter: 79000/250000
training loss: 3.3591181843077 on iter: 79100/250000
training loss: 2.6262738998995 on iter: 79200/250000
training loss: 2.2606989460131 on iter: 79300/250000
training loss: 3.2887546648892 on iter: 79400/250000
training loss: 3.1768169501087 on iter: 79500/250000
training loss: 3.198205647506 on iter: 79600/250000
training loss: 2.2238094488775 on iter: 79700/250000
training loss: 3.2334920867553 on iter: 79800/250000
training loss: 3.2942414178256 on iter: 79900/250000
training loss: 3.6058109288632 on iter: 80000/250000
training loss: 3.2230784054313 on iter: 80100/250000
training loss: 3.2027173630003 on iter: 80200/250000
training loss: 3.2214875897907 on iter: 80300/250000
training loss: 4.2101090875936 on iter: 80400/250000
training loss: 3.3548407887723 on iter: 80500/250000
training loss: 2.4418385572029 on iter: 80600/250000
training loss: 2.8578159210275 on iter: 80700/250000
training loss: 2.7935679163327 on iter: 80800/250000
training loss: 2.8220729725677 on iter: 80900/250000
training loss: 3.1391517692944 on iter: 81000/250000
training loss: 1.9498078732668 on iter: 81100/250000
training loss: 3.654250005333 on iter: 81200/250000
training loss: 2.9501578020433 on iter: 81300/250000
training loss: 2.7929608677126 on iter: 81400/250000
training loss: 3.2939290028892 on iter: 81500/250000
training loss: 3.4680070501235 on iter: 81600/250000
training loss: 3.3280540667712 on iter: 81700/250000
training loss: 3.0938602474298 on iter: 81800/250000
training loss: 3.5073672352356 on iter: 81900/250000
training loss: 2.7900153641872 on iter: 82000/250000
training loss: 3.4405861323533 on iter: 82100/250000
training loss: 3.1700688951535 on iter: 82200/250000
training loss: 3.4075003273354 on iter: 82300/250000
training loss: 2.5209620019539 on iter: 82400/250000
training loss: 3.4567301732889 on iter: 82500/250000
training loss: 2.4024020675504 on iter: 82600/250000
training loss: 3.3135311361647 on iter: 82700/250000
training loss: 2.5500747732108 on iter: 82800/250000
training loss: 2.6894272315724 on iter: 82900/250000
training loss: 3.3764500482183 on iter: 83000/250000
training loss: 2.8372154205917 on iter: 83100/250000
training loss: 2.3417127032018 on iter: 83200/250000
training loss: 2.8799256959584 on iter: 83300/250000
training loss: 2.9360904677274 on iter: 83400/250000
training loss: 3.4810211370366 on iter: 83500/250000
training loss: 2.8113289115831 on iter: 83600/250000
training loss: 3.4172887636923 on iter: 83700/250000
training loss: 3.2389820104909 on iter: 83800/250000
training loss: 3.1292901858822 on iter: 83900/250000
training loss: 3.9974648098504 on iter: 84000/250000
training loss: 2.6772020756195 on iter: 84100/250000
training loss: 3.129729830858 on iter: 84200/250000
training loss: 2.9587852440105 on iter: 84300/250000
training loss: 3.4128124443371 on iter: 84400/250000
training loss: 2.395707618862 on iter: 84500/250000
training loss: 3.1472676669072 on iter: 84600/250000
training loss: 3.1195219232048 on iter: 84700/250000
training loss: 2.511016136624 on iter: 84800/250000
training loss: 3.3961748996283 on iter: 84900/250000
training loss: 3.1927222401296 on iter: 85000/250000
training loss: 3.4691962321855 on iter: 85100/250000
training loss: 4.3291287482367 on iter: 85200/250000
training loss: 3.0518685633107 on iter: 85300/250000
training loss: 3.1125989369108 on iter: 85400/250000
training loss: 2.7204273772526 on iter: 85500/250000
training loss: 4.1785111407049 on iter: 85600/250000
training loss: 3.0994636893797 on iter: 85700/250000
training loss: 3.0482969639376 on iter: 85800/250000
training loss: 3.1148111293486 on iter: 85900/250000
training loss: 3.0078082013296 on iter: 86000/250000
training loss: 3.3719060015752 on iter: 86100/250000
training loss: 3.2692618160372 on iter: 86200/250000
training loss: 2.2311862267969 on iter: 86300/250000
training loss: 3.1589019288189 on iter: 86400/250000
training loss: 3.4220873491725 on iter: 86500/250000
training loss: 2.4769188498658 on iter: 86600/250000
training loss: 3.251647972133 on iter: 86700/250000
training loss: 2.8501730099248 on iter: 86800/250000
training loss: 2.9203825061127 on iter: 86900/250000
training loss: 2.7600926773579 on iter: 87000/250000
training loss: 3.0718034821993 on iter: 87100/250000
training loss: 3.7511493316866 on iter: 87200/250000
training loss: 2.9744798411961 on iter: 87300/250000
training loss: 3.7476226584925 on iter: 87400/250000
training loss: 2.425198630754 on iter: 87500/250000
training loss: 2.9974591421726 on iter: 87600/250000
training loss: 2.8944863264225 on iter: 87700/250000
training loss: 2.2629894952281 on iter: 87800/250000
training loss: 3.6786404998716 on iter: 87900/250000
training loss: 3.5475862504115 on iter: 88000/250000
training loss: 3.4926289871406 on iter: 88100/250000
training loss: 2.7232116427345 on iter: 88200/250000
training loss: 3.8909667089994 on iter: 88300/250000
training loss: 2.30766226088 on iter: 88400/250000
training loss: 3.0133614748243 on iter: 88500/250000
training loss: 3.2156360421549 on iter: 88600/250000
training loss: 3.5697917927919 on iter: 88700/250000
training loss: 2.3174387334497 on iter: 88800/250000
training loss: 3.2643335358689 on iter: 88900/250000
training loss: 2.9133758762182 on iter: 89000/250000
training loss: 2.8184668001381 on iter: 89100/250000
training loss: 3.1268242135431 on iter: 89200/250000
training loss: 3.5708802871002 on iter: 89300/250000
training loss: 3.5051706604978 on iter: 89400/250000
training loss: 3.2994268299171 on iter: 89500/250000
training loss: 3.1269773547202 on iter: 89600/250000
training loss: 3.2506228038989 on iter: 89700/250000
training loss: 3.8122969970805 on iter: 89800/250000
training loss: 2.9197335341476 on iter: 89900/250000
training loss: 3.5898873465625 on iter: 90000/250000
training loss: 2.9288423034565 on iter: 90100/250000
training loss: 3.1318015021499 on iter: 90200/250000
training loss: 3.3519926644496 on iter: 90300/250000
training loss: 3.0221232849678 on iter: 90400/250000
training loss: 2.9306712604931 on iter: 90500/250000
training loss: 3.5486007142779 on iter: 90600/250000
training loss: 3.2471867135975 on iter: 90700/250000
training loss: 3.1211265111915 on iter: 90800/250000
training loss: 3.0473616105829 on iter: 90900/250000
training loss: 2.5899780232682 on iter: 91000/250000
training loss: 3.1634816056443 on iter: 91100/250000
training loss: 3.5856914015124 on iter: 91200/250000
training loss: 3.2563798281326 on iter: 91300/250000
training loss: 2.7048318722128 on iter: 91400/250000
training loss: 3.1053316427279 on iter: 91500/250000
training loss: 3.0435794736599 on iter: 91600/250000
training loss: 3.1275282116733 on iter: 91700/250000
training loss: 3.1013320098796 on iter: 91800/250000
training loss: 3.650391512821 on iter: 91900/250000
training loss: 3.063035893383 on iter: 92000/250000
training loss: 2.3581747701905 on iter: 92100/250000
training loss: 3.7426702592304 on iter: 92200/250000
training loss: 3.6779278999341 on iter: 92300/250000
training loss: 3.4739542149877 on iter: 92400/250000
training loss: 3.0219578748349 on iter: 92500/250000
training loss: 4.0953094589712 on iter: 92600/250000
training loss: 3.6325675677517 on iter: 92700/250000
training loss: 2.9792486169724 on iter: 92800/250000
training loss: 3.4663189364315 on iter: 92900/250000
training loss: 3.1925643293477 on iter: 93000/250000
training loss: 2.8440017367545 on iter: 93100/250000
training loss: 3.5428949394661 on iter: 93200/250000
training loss: 2.2840789227632 on iter: 93300/250000
training loss: 3.2028367629199 on iter: 93400/250000
training loss: 2.8658498947016 on iter: 93500/250000
training loss: 3.2839366072242 on iter: 93600/250000
training loss: 3.2930808529797 on iter: 93700/250000
training loss: 2.9412860908112 on iter: 93800/250000
training loss: 3.7553009812439 on iter: 93900/250000
training loss: 3.4040483160218 on iter: 94000/250000
training loss: 3.2638199318515 on iter: 94100/250000
training loss: 2.9790121323035 on iter: 94200/250000
training loss: 3.2701793708365 on iter: 94300/250000
training loss: 3.1168219086549 on iter: 94400/250000
training loss: 3.1477984545009 on iter: 94500/250000
training loss: 2.6505683483284 on iter: 94600/250000
training loss: 2.8730924269438 on iter: 94700/250000
training loss: 3.7728040242791 on iter: 94800/250000
training loss: 3.4050009655241 on iter: 94900/250000
training loss: 3.3073012549963 on iter: 95000/250000
training loss: 2.8045395519037 on iter: 95100/250000
training loss: 3.5551399201337 on iter: 95200/250000
training loss: 3.356406596234 on iter: 95300/250000
training loss: 3.3134647733408 on iter: 95400/250000
training loss: 2.8284596124776 on iter: 95500/250000
training loss: 2.0983338922516 on iter: 95600/250000
training loss: 3.7266426529208 on iter: 95700/250000
training loss: 3.0584853765345 on iter: 95800/250000
training loss: 2.6709728068981 on iter: 95900/250000
training loss: 2.7576421956019 on iter: 96000/250000
training loss: 2.6872263011959 on iter: 96100/250000
training loss: 2.1441335647624 on iter: 96200/250000
training loss: 2.9772689995358 on iter: 96300/250000
training loss: 3.1263591854684 on iter: 96400/250000
training loss: 3.2558577746374 on iter: 96500/250000
training loss: 4.2056705094636 on iter: 96600/250000
training loss: 3.6340333024921 on iter: 96700/250000
training loss: 3.1833426867569 on iter: 96800/250000
training loss: 3.6951074246976 on iter: 96900/250000
training loss: 2.6953963010955 on iter: 97000/250000
training loss: 2.7063525933895 on iter: 97100/250000
training loss: 3.388374714191 on iter: 97200/250000
training loss: 2.7806212621649 on iter: 97300/250000
training loss: 3.0068240848142 on iter: 97400/250000
training loss: 3.1704782633865 on iter: 97500/250000
training loss: 3.1095438764923 on iter: 97600/250000
training loss: 3.0327824408005 on iter: 97700/250000
training loss: 2.6600096105185 on iter: 97800/250000
training loss: 2.8305482464535 on iter: 97900/250000
training loss: 3.501474315035 on iter: 98000/250000
training loss: 2.5059130930923 on iter: 98100/250000
training loss: 3.2790858865465 on iter: 98200/250000
training loss: 2.6877396411224 on iter: 98300/250000
training loss: 3.3024526478581 on iter: 98400/250000
training loss: 2.6391447694271 on iter: 98500/250000
training loss: 3.3747765738374 on iter: 98600/250000
training loss: 3.2434537123341 on iter: 98700/250000
training loss: 3.2348953412876 on iter: 98800/250000
training loss: 3.0172559775923 on iter: 98900/250000
training loss: 3.046895016979 on iter: 99000/250000
training loss: 4.1429749341762 on iter: 99100/250000
training loss: 2.9592374395662 on iter: 99200/250000
training loss: 3.053165618422 on iter: 99300/250000
training loss: 3.1380669949742 on iter: 99400/250000
training loss: 3.8284847755198 on iter: 99500/250000
training loss: 3.256313051897 on iter: 99600/250000
training loss: 3.0802980008282 on iter: 99700/250000
training loss: 2.7020287307575 on iter: 99800/250000
training loss: 2.5346986663035 on iter: 99900/250000
training loss: 3.5166636628935 on iter: 100000/250000
learining rate: 0.00010800255934116
training loss: 3.0327017420295 on iter: 100100/250000
training loss: 3.0043601261885 on iter: 100200/250000
training loss: 2.3483058381292 on iter: 100300/250000
training loss: 3.1369173237477 on iter: 100400/250000
training loss: 3.2270886778905 on iter: 100500/250000
training loss: 2.7546400152372 on iter: 100600/250000
training loss: 3.1854861316964 on iter: 100700/250000
training loss: 2.947400158307 on iter: 100800/250000
training loss: 2.4029443024843 on iter: 100900/250000
training loss: 2.8576742350921 on iter: 101000/250000
training loss: 2.76252542774 on iter: 101100/250000
training loss: 3.3523808885525 on iter: 101200/250000
training loss: 3.067249273005 on iter: 101300/250000
training loss: 3.0163809004738 on iter: 101400/250000
training loss: 3.155239220898 on iter: 101500/250000
training loss: 2.6385179335559 on iter: 101600/250000
training loss: 3.3343382314907 on iter: 101700/250000
training loss: 3.1990294595618 on iter: 101800/250000
training loss: 2.7049420329027 on iter: 101900/250000
training loss: 3.2461680463507 on iter: 102000/250000
training loss: 2.1099336758842 on iter: 102100/250000
training loss: 3.1788909131081 on iter: 102200/250000
training loss: 2.8964299482969 on iter: 102300/250000
training loss: 3.1377651536462 on iter: 102400/250000
training loss: 2.6943582893916 on iter: 102500/250000
training loss: 3.300301879228 on iter: 102600/250000
training loss: 3.4158104287315 on iter: 102700/250000
training loss: 3.104468719876 on iter: 102800/250000
training loss: 3.1090663011995 on iter: 102900/250000
training loss: 3.0552151874723 on iter: 103000/250000
training loss: 3.1430850909705 on iter: 103100/250000
training loss: 3.107005321394 on iter: 103200/250000
training loss: 2.4408520316678 on iter: 103300/250000
training loss: 2.8610589294825 on iter: 103400/250000
training loss: 2.9809905670188 on iter: 103500/250000
training loss: 2.6914294339634 on iter: 103600/250000
training loss: 3.3739967365406 on iter: 103700/250000
training loss: 2.6266585611307 on iter: 103800/250000
training loss: 3.2831438093502 on iter: 103900/250000
training loss: 2.8985822828502 on iter: 104000/250000
training loss: 3.0407285237174 on iter: 104100/250000
training loss: 2.7260768055986 on iter: 104200/250000
training loss: 3.5789948914445 on iter: 104300/250000
training loss: 3.9153331602912 on iter: 104400/250000
training loss: 2.630021062834 on iter: 104500/250000
training loss: 3.2687744056259 on iter: 104600/250000
training loss: 3.2658529999358 on iter: 104700/250000
training loss: 3.8057415731281 on iter: 104800/250000
training loss: 3.4557269457401 on iter: 104900/250000
training loss: 3.8667059853996 on iter: 105000/250000
training loss: 3.5945842063405 on iter: 105100/250000
training loss: 3.0547672694087 on iter: 105200/250000
training loss: 3.9481517606064 on iter: 105300/250000
training loss: 2.5511658920515 on iter: 105400/250000
training loss: 2.8418605460209 on iter: 105500/250000
training loss: 2.9229798378816 on iter: 105600/250000
training loss: 2.974029656887 on iter: 105700/250000
training loss: 2.98535423367 on iter: 105800/250000
training loss: 2.479477220839 on iter: 105900/250000
training loss: 3.4901579850471 on iter: 106000/250000
training loss: 2.579121955898 on iter: 106100/250000
training loss: 3.3435759833006 on iter: 106200/250000
training loss: 3.5600289379082 on iter: 106300/250000
training loss: 3.8907775081097 on iter: 106400/250000
training loss: 3.6427952302829 on iter: 106500/250000
training loss: 2.5540324270303 on iter: 106600/250000
training loss: 3.2173950364349 on iter: 106700/250000
training loss: 2.9645441634375 on iter: 106800/250000
training loss: 2.8741509793922 on iter: 106900/250000
training loss: 2.968563292478 on iter: 107000/250000
training loss: 3.5502837444237 on iter: 107100/250000
training loss: 3.6995200669572 on iter: 107200/250000
training loss: 2.6925393016283 on iter: 107300/250000
training loss: 2.8479379785926 on iter: 107400/250000
training loss: 3.9507601994338 on iter: 107500/250000
training loss: 2.8907833741665 on iter: 107600/250000
training loss: 3.1137478169957 on iter: 107700/250000
training loss: 3.3648006089113 on iter: 107800/250000
training loss: 3.3243716764326 on iter: 107900/250000
training loss: 3.3900200463524 on iter: 108000/250000
training loss: 2.6656432812315 on iter: 108100/250000
training loss: 2.8468445800312 on iter: 108200/250000
training loss: 2.234889786045 on iter: 108300/250000
training loss: 3.5025901886395 on iter: 108400/250000
training loss: 2.9095283700532 on iter: 108500/250000
training loss: 3.3467881966215 on iter: 108600/250000
training loss: 3.7923696172378 on iter: 108700/250000
training loss: 2.9379007433452 on iter: 108800/250000
training loss: 2.6508456317231 on iter: 108900/250000
training loss: 3.3571601232453 on iter: 109000/250000
training loss: 3.1177949598552 on iter: 109100/250000
training loss: 2.9646612498049 on iter: 109200/250000
training loss: 3.1464471644156 on iter: 109300/250000
training loss: 3.727860828138 on iter: 109400/250000
training loss: 2.7205805393364 on iter: 109500/250000
training loss: 4.0124013875446 on iter: 109600/250000
training loss: 2.7342922441869 on iter: 109700/250000
training loss: 2.9271065676206 on iter: 109800/250000
training loss: 3.3331696738018 on iter: 109900/250000
training loss: 3.2603957803673 on iter: 110000/250000
training loss: 3.2160846988682 on iter: 110100/250000
training loss: 3.9017083877573 on iter: 110200/250000
training loss: 3.0941670413231 on iter: 110300/250000
training loss: 3.6408876193108 on iter: 110400/250000
training loss: 2.6621053790959 on iter: 110500/250000
training loss: 3.7872072734986 on iter: 110600/250000
training loss: 2.5068279633685 on iter: 110700/250000
training loss: 3.4644566300707 on iter: 110800/250000
training loss: 2.8488684658305 on iter: 110900/250000
training loss: 3.0461562885322 on iter: 111000/250000
training loss: 3.0728875486956 on iter: 111100/250000
training loss: 3.3074915656514 on iter: 111200/250000
training loss: 3.3149377763177 on iter: 111300/250000
training loss: 2.3290811353618 on iter: 111400/250000
training loss: 2.6550002238273 on iter: 111500/250000
training loss: 2.5894566624914 on iter: 111600/250000
training loss: 2.7848947419213 on iter: 111700/250000
training loss: 2.710791002836 on iter: 111800/250000
training loss: 4.3098872676428 on iter: 111900/250000
training loss: 2.539215419408 on iter: 112000/250000
training loss: 3.3759674520408 on iter: 112100/250000
training loss: 3.552321277245 on iter: 112200/250000
training loss: 3.9893470558421 on iter: 112300/250000
training loss: 3.725465338495 on iter: 112400/250000
training loss: 2.5017534661636 on iter: 112500/250000
training loss: 2.831314149377 on iter: 112600/250000
training loss: 3.4464632786046 on iter: 112700/250000
training loss: 3.8647629961771 on iter: 112800/250000
training loss: 3.396715273239 on iter: 112900/250000
training loss: 2.7266483627859 on iter: 113000/250000
training loss: 3.7101811117513 on iter: 113100/250000
training loss: 2.7636866595254 on iter: 113200/250000
training loss: 2.7900540937878 on iter: 113300/250000
training loss: 3.4053493105334 on iter: 113400/250000
training loss: 2.8959985977739 on iter: 113500/250000
training loss: 3.7400952471001 on iter: 113600/250000
training loss: 3.1881377281127 on iter: 113700/250000
training loss: 2.6330891487682 on iter: 113800/250000
training loss: 3.3171659779071 on iter: 113900/250000
training loss: 2.7993992687263 on iter: 114000/250000
training loss: 3.5661972859175 on iter: 114100/250000
training loss: 2.9114864932412 on iter: 114200/250000
training loss: 3.2800801564738 on iter: 114300/250000
training loss: 3.5601675371408 on iter: 114400/250000
training loss: 4.2804892521342 on iter: 114500/250000
training loss: 3.2977155047946 on iter: 114600/250000
training loss: 3.4957948516788 on iter: 114700/250000
training loss: 2.5563933136423 on iter: 114800/250000
training loss: 2.9728177094041 on iter: 114900/250000
training loss: 3.1738382023377 on iter: 115000/250000
training loss: 3.175994371068 on iter: 115100/250000
training loss: 2.8979501048974 on iter: 115200/250000
training loss: 3.1710610605401 on iter: 115300/250000
training loss: 3.3544019353752 on iter: 115400/250000
training loss: 2.7309667200307 on iter: 115500/250000
training loss: 3.6437426537497 on iter: 115600/250000
training loss: 2.671868165741 on iter: 115700/250000
training loss: 3.6805832552689 on iter: 115800/250000
training loss: 3.0334910349853 on iter: 115900/250000
training loss: 2.9602760197548 on iter: 116000/250000
training loss: 2.8171066877077 on iter: 116100/250000
training loss: 3.4313771606891 on iter: 116200/250000
training loss: 3.0808029882413 on iter: 116300/250000
training loss: 4.1728280545765 on iter: 116400/250000
training loss: 2.7881171982372 on iter: 116500/250000
training loss: 4.3308932273722 on iter: 116600/250000
training loss: 3.1602761559935 on iter: 116700/250000
training loss: 2.7502681287373 on iter: 116800/250000
training loss: 3.4654715435911 on iter: 116900/250000
training loss: 3.209088821843 on iter: 117000/250000
training loss: 3.4075702876735 on iter: 117100/250000
training loss: 3.1278112512196 on iter: 117200/250000
training loss: 3.1547824844886 on iter: 117300/250000
training loss: 3.139485246576 on iter: 117400/250000
training loss: 2.6224858415888 on iter: 117500/250000
training loss: 2.2517658332248 on iter: 117600/250000
training loss: 3.2536973531695 on iter: 117700/250000
training loss: 3.2436183920314 on iter: 117800/250000
training loss: 3.471084366721 on iter: 117900/250000
training loss: 2.8443562802975 on iter: 118000/250000
training loss: 3.0285704528599 on iter: 118100/250000
training loss: 2.974299099301 on iter: 118200/250000
training loss: 3.4206898539348 on iter: 118300/250000
training loss: 3.4367176987295 on iter: 118400/250000
training loss: 2.4287144435125 on iter: 118500/250000
training loss: 3.5051093348717 on iter: 118600/250000
training loss: 3.0936920240242 on iter: 118700/250000
training loss: 3.6334337856062 on iter: 118800/250000
training loss: 3.0211692506861 on iter: 118900/250000
training loss: 3.7424445906015 on iter: 119000/250000
training loss: 2.7867023310022 on iter: 119100/250000
training loss: 3.5849450493849 on iter: 119200/250000
training loss: 3.5192056421308 on iter: 119300/250000
training loss: 3.6312604778398 on iter: 119400/250000
training loss: 3.5396104616116 on iter: 119500/250000
training loss: 3.3336648110385 on iter: 119600/250000
training loss: 3.8616491588769 on iter: 119700/250000
training loss: 3.0951531523498 on iter: 119800/250000
training loss: 3.3768835142452 on iter: 119900/250000
training loss: 3.3174370313643 on iter: 120000/250000
training loss: 3.2293691798106 on iter: 120100/250000
training loss: 3.0411614023024 on iter: 120200/250000
training loss: 3.189015453773 on iter: 120300/250000
training loss: 2.5612232206446 on iter: 120400/250000
training loss: 2.6045983757342 on iter: 120500/250000
training loss: 3.8470014734743 on iter: 120600/250000
training loss: 3.2870030646995 on iter: 120700/250000
training loss: 3.8434835293914 on iter: 120800/250000
training loss: 3.1579680845632 on iter: 120900/250000
training loss: 3.1842635354748 on iter: 121000/250000
training loss: 3.5211043013247 on iter: 121100/250000
training loss: 3.7980213227086 on iter: 121200/250000
training loss: 3.3544774642649 on iter: 121300/250000
training loss: 2.3993067332773 on iter: 121400/250000
training loss: 3.0105034575557 on iter: 121500/250000
training loss: 2.412372100314 on iter: 121600/250000
training loss: 3.245057538935 on iter: 121700/250000
training loss: 3.071393729731 on iter: 121800/250000
training loss: 3.1755162557313 on iter: 121900/250000
training loss: 3.0874250719724 on iter: 122000/250000
training loss: 3.0643962406869 on iter: 122100/250000
training loss: 3.7085187792358 on iter: 122200/250000
training loss: 3.6646783257726 on iter: 122300/250000
training loss: 3.4542047718727 on iter: 122400/250000
training loss: 3.2804198223328 on iter: 122500/250000
training loss: 3.8656796903498 on iter: 122600/250000
training loss: 2.7001151515891 on iter: 122700/250000
training loss: 2.8394461835667 on iter: 122800/250000
training loss: 2.0427604644739 on iter: 122900/250000
training loss: 3.3410999471955 on iter: 123000/250000
training loss: 3.7929102307236 on iter: 123100/250000
training loss: 4.3448799497018 on iter: 123200/250000
training loss: 3.8625554735273 on iter: 123300/250000
training loss: 2.9390969452974 on iter: 123400/250000
training loss: 3.0657122509523 on iter: 123500/250000
training loss: 3.0676532421766 on iter: 123600/250000
training loss: 3.3349648658899 on iter: 123700/250000
training loss: 3.2312185321344 on iter: 123800/250000
training loss: 3.4640070599703 on iter: 123900/250000
training loss: 2.8967203470916 on iter: 124000/250000
training loss: 2.8835648978897 on iter: 124100/250000
training loss: 4.0374745514961 on iter: 124200/250000
training loss: 3.0657412780713 on iter: 124300/250000
training loss: 3.3077707283838 on iter: 124400/250000
training loss: 2.9806886392577 on iter: 124500/250000
training loss: 3.5263765798188 on iter: 124600/250000
training loss: 2.854285103727 on iter: 124700/250000
training loss: 3.3138572839175 on iter: 124800/250000
training loss: 2.8026803573502 on iter: 124900/250000
training loss: 2.6206723035756 on iter: 125000/250000
training loss: 3.2435108520089 on iter: 125100/250000
training loss: 3.1093118093974 on iter: 125200/250000
training loss: 2.9005788224828 on iter: 125300/250000
training loss: 3.656490588891 on iter: 125400/250000
training loss: 3.7016895276885 on iter: 125500/250000
training loss: 3.9865265586336 on iter: 125600/250000
training loss: 3.2651060585369 on iter: 125700/250000
training loss: 3.0308002324835 on iter: 125800/250000
training loss: 3.5403102052032 on iter: 125900/250000
training loss: 2.7821402850176 on iter: 126000/250000
training loss: 3.4626473096686 on iter: 126100/250000
training loss: 4.0625707351992 on iter: 126200/250000
training loss: 3.7021121813473 on iter: 126300/250000
training loss: 3.5333637125547 on iter: 126400/250000
training loss: 3.7265484082772 on iter: 126500/250000
training loss: 3.0229574140024 on iter: 126600/250000
training loss: 3.0197777952738 on iter: 126700/250000
training loss: 3.1367627609358 on iter: 126800/250000
training loss: 3.0096065505111 on iter: 126900/250000
training loss: 3.3737551371075 on iter: 127000/250000
training loss: 2.9019613417664 on iter: 127100/250000
training loss: 3.6494029929292 on iter: 127200/250000
training loss: 2.7846891664753 on iter: 127300/250000
training loss: 2.7026986416239 on iter: 127400/250000
training loss: 2.2921263862411 on iter: 127500/250000
training loss: 4.5822965019221 on iter: 127600/250000
training loss: 3.3914127375068 on iter: 127700/250000
training loss: 4.3274187931269 on iter: 127800/250000
training loss: 2.7970406287871 on iter: 127900/250000
training loss: 3.3927563432748 on iter: 128000/250000
training loss: 2.9279718493193 on iter: 128100/250000
training loss: 2.8524965411727 on iter: 128200/250000
training loss: 2.9311737382056 on iter: 128300/250000
training loss: 2.7557931046529 on iter: 128400/250000
training loss: 2.6373191133639 on iter: 128500/250000
training loss: 3.7355548711346 on iter: 128600/250000
training loss: 3.1868085651649 on iter: 128700/250000
training loss: 3.7110740265008 on iter: 128800/250000
training loss: 3.0261776113164 on iter: 128900/250000
training loss: 3.0346132039891 on iter: 129000/250000
training loss: 3.7432630073981 on iter: 129100/250000
training loss: 3.5060368308344 on iter: 129200/250000
training loss: 3.2192135990116 on iter: 129300/250000
training loss: 3.7075491474819 on iter: 129400/250000
training loss: 3.572578373616 on iter: 129500/250000
training loss: 2.5093222556805 on iter: 129600/250000
training loss: 3.6901205130724 on iter: 129700/250000
training loss: 3.4463394520436 on iter: 129800/250000
training loss: 3.3619943797611 on iter: 129900/250000
training loss: 3.251638675758 on iter: 130000/250000
training loss: 3.3855235267567 on iter: 130100/250000
training loss: 3.1340644647847 on iter: 130200/250000
training loss: 2.984781031019 on iter: 130300/250000
training loss: 3.1363475816719 on iter: 130400/250000
training loss: 3.2370922427589 on iter: 130500/250000
training loss: 3.0089355262775 on iter: 130600/250000
training loss: 2.76332219534 on iter: 130700/250000
training loss: 3.5962168305545 on iter: 130800/250000
training loss: 3.1655568831983 on iter: 130900/250000
training loss: 3.2595483582315 on iter: 131000/250000
training loss: 3.0061282343243 on iter: 131100/250000
training loss: 3.4982263922065 on iter: 131200/250000
training loss: 2.938896876162 on iter: 131300/250000
training loss: 2.7406563713496 on iter: 131400/250000
training loss: 3.0690331538141 on iter: 131500/250000
training loss: 2.9989883301511 on iter: 131600/250000
training loss: 3.5227650981593 on iter: 131700/250000
training loss: 3.3092560558557 on iter: 131800/250000
training loss: 2.7812300760032 on iter: 131900/250000
training loss: 3.382802555913 on iter: 132000/250000
training loss: 2.4863394125521 on iter: 132100/250000
training loss: 3.3825332669598 on iter: 132200/250000
training loss: 3.3384780585014 on iter: 132300/250000
training loss: 3.225951805511 on iter: 132400/250000
training loss: 3.8456689669259 on iter: 132500/250000
training loss: 2.7006159980568 on iter: 132600/250000
training loss: 2.9030357203861 on iter: 132700/250000
training loss: 2.8348710446756 on iter: 132800/250000
training loss: 3.7420715627668 on iter: 132900/250000
training loss: 2.6818271641324 on iter: 133000/250000
training loss: 2.7335176612945 on iter: 133100/250000
training loss: 3.4760274661864 on iter: 133200/250000
training loss: 4.297030414639 on iter: 133300/250000
training loss: 3.9040222039156 on iter: 133400/250000
training loss: 3.457167754976 on iter: 133500/250000
training loss: 2.273808055569 on iter: 133600/250000
training loss: 2.929923878328 on iter: 133700/250000
training loss: 3.2843511994527 on iter: 133800/250000
training loss: 3.0787945221585 on iter: 133900/250000
training loss: 2.3951106293523 on iter: 134000/250000
training loss: 3.1332972037319 on iter: 134100/250000
training loss: 3.7740280135339 on iter: 134200/250000
training loss: 2.8767530583582 on iter: 134300/250000
training loss: 2.8986970117176 on iter: 134400/250000
training loss: 3.7603853900843 on iter: 134500/250000
training loss: 3.0503293855577 on iter: 134600/250000
training loss: 4.0747587621205 on iter: 134700/250000
training loss: 2.1033811302324 on iter: 134800/250000
training loss: 3.9670318102385 on iter: 134900/250000
training loss: 2.898377254116 on iter: 135000/250000
training loss: 4.0834643885448 on iter: 135100/250000
training loss: 2.6499546521104 on iter: 135200/250000
training loss: 3.4699485890192 on iter: 135300/250000
training loss: 3.7685536807408 on iter: 135400/250000
training loss: 3.7817317951462 on iter: 135500/250000
training loss: 2.7216370499902 on iter: 135600/250000
training loss: 3.7877884003728 on iter: 135700/250000
training loss: 2.644875574328 on iter: 135800/250000
training loss: 2.9473181592156 on iter: 135900/250000
training loss: 3.1952640872438 on iter: 136000/250000
training loss: 3.0441817140406 on iter: 136100/250000
training loss: 2.8990257963323 on iter: 136200/250000
training loss: 3.2789071562923 on iter: 136300/250000
training loss: 3.2170621701291 on iter: 136400/250000
training loss: 3.2093931097987 on iter: 136500/250000
training loss: 3.4642936145097 on iter: 136600/250000
training loss: 3.8289550616868 on iter: 136700/250000
training loss: 3.1675846216891 on iter: 136800/250000
training loss: 2.6621239981858 on iter: 136900/250000
training loss: 2.9253457830618 on iter: 137000/250000
training loss: 2.7749435582343 on iter: 137100/250000
training loss: 3.1133955179094 on iter: 137200/250000
training loss: 2.701843110242 on iter: 137300/250000
training loss: 3.18229135304 on iter: 137400/250000
training loss: 3.192168657988 on iter: 137500/250000
training loss: 3.4169345237924 on iter: 137600/250000
training loss: 3.1308363997876 on iter: 137700/250000
training loss: 4.3349108975562 on iter: 137800/250000
training loss: 3.2250164964621 on iter: 137900/250000
training loss: 3.0405955423015 on iter: 138000/250000
training loss: 3.4677698572577 on iter: 138100/250000
training loss: 4.0276753233242 on iter: 138200/250000
training loss: 2.8954248101193 on iter: 138300/250000
training loss: 3.2068763445184 on iter: 138400/250000
training loss: 2.6180740622832 on iter: 138500/250000
training loss: 4.0280012324957 on iter: 138600/250000
training loss: 3.2864066604615 on iter: 138700/250000
training loss: 2.7350475138255 on iter: 138800/250000
training loss: 3.7413219220919 on iter: 138900/250000
training loss: 3.3954306605652 on iter: 139000/250000
training loss: 4.3871186006096 on iter: 139100/250000
training loss: 2.8035299312335 on iter: 139200/250000
training loss: 3.4227835012821 on iter: 139300/250000
training loss: 3.4392435118843 on iter: 139400/250000
training loss: 3.1000948792405 on iter: 139500/250000
training loss: 2.4153497145153 on iter: 139600/250000
training loss: 3.30224922308 on iter: 139700/250000
training loss: 3.6278577711431 on iter: 139800/250000
training loss: 2.8267309169494 on iter: 139900/250000
training loss: 3.4863776650989 on iter: 140000/250000
training loss: 3.38037487229 on iter: 140100/250000
training loss: 3.2481930674116 on iter: 140200/250000
training loss: 3.2655751046564 on iter: 140300/250000
training loss: 2.758453999924 on iter: 140400/250000
training loss: 2.9488923947822 on iter: 140500/250000
training loss: 2.9926604973965 on iter: 140600/250000
training loss: 3.1427456631559 on iter: 140700/250000
training loss: 3.3804450225822 on iter: 140800/250000
training loss: 3.4185912309376 on iter: 140900/250000
training loss: 3.2000879983553 on iter: 141000/250000
training loss: 3.2863742893095 on iter: 141100/250000
training loss: 2.9837140325776 on iter: 141200/250000
training loss: 2.8296475621114 on iter: 141300/250000
training loss: 3.1688849808197 on iter: 141400/250000
training loss: 3.05112531855 on iter: 141500/250000
training loss: 2.3742748877413 on iter: 141600/250000
training loss: 3.85954299821 on iter: 141700/250000
training loss: 4.0238458793738 on iter: 141800/250000
training loss: 3.9343234528301 on iter: 141900/250000
training loss: 3.3632356430867 on iter: 142000/250000
training loss: 3.2940474546221 on iter: 142100/250000
training loss: 3.6632600346174 on iter: 142200/250000
training loss: 3.9853771233897 on iter: 142300/250000
training loss: 2.7545035466454 on iter: 142400/250000
training loss: 3.3179722584262 on iter: 142500/250000
training loss: 3.7943516924842 on iter: 142600/250000
training loss: 3.1179983973593 on iter: 142700/250000
training loss: 2.4076765076489 on iter: 142800/250000
training loss: 3.3274338017246 on iter: 142900/250000
training loss: 3.7232831610194 on iter: 143000/250000
training loss: 4.5197103583005 on iter: 143100/250000
training loss: 2.8408868125225 on iter: 143200/250000
training loss: 3.185109842104 on iter: 143300/250000
training loss: 2.887862338748 on iter: 143400/250000
training loss: 3.5252361691948 on iter: 143500/250000
training loss: 3.0373416908365 on iter: 143600/250000
training loss: 3.2159435761685 on iter: 143700/250000
training loss: 3.1737491643951 on iter: 143800/250000
training loss: 3.5612016443001 on iter: 143900/250000
training loss: 3.5853935102736 on iter: 144000/250000
training loss: 3.8828512139394 on iter: 144100/250000
training loss: 2.8973377244791 on iter: 144200/250000
training loss: 3.2840904231728 on iter: 144300/250000
training loss: 3.0248808256738 on iter: 144400/250000
training loss: 3.1876154626248 on iter: 144500/250000
training loss: 3.6200120597968 on iter: 144600/250000
training loss: 2.7942218257205 on iter: 144700/250000
training loss: 2.5064167097631 on iter: 144800/250000
training loss: 4.3462181661319 on iter: 144900/250000
training loss: 3.2641078910647 on iter: 145000/250000
training loss: 3.6386033968739 on iter: 145100/250000
training loss: 4.1853365435125 on iter: 145200/250000
training loss: 3.639925872898 on iter: 145300/250000
training loss: 3.2540036523072 on iter: 145400/250000
training loss: 3.9543448940209 on iter: 145500/250000
training loss: 2.9510704586321 on iter: 145600/250000
training loss: 3.1482658748846 on iter: 145700/250000
training loss: 3.1477700058503 on iter: 145800/250000
training loss: 3.9217535966786 on iter: 145900/250000
training loss: 2.6267538402145 on iter: 146000/250000
training loss: 3.8646165105989 on iter: 146100/250000
training loss: 3.6350959210137 on iter: 146200/250000
training loss: 3.364238211062 on iter: 146300/250000
training loss: 3.040697528937 on iter: 146400/250000
training loss: 3.3298473690851 on iter: 146500/250000
training loss: 2.8269002760308 on iter: 146600/250000
training loss: 3.7106552496781 on iter: 146700/250000
training loss: 3.1129169994885 on iter: 146800/250000
training loss: 3.2478350334756 on iter: 146900/250000
training loss: 2.2820623665378 on iter: 147000/250000
training loss: 3.7007701020221 on iter: 147100/250000
training loss: 3.4116295911355 on iter: 147200/250000
training loss: 2.893104343732 on iter: 147300/250000
training loss: 2.3701725670259 on iter: 147400/250000
training loss: 3.7256456226062 on iter: 147500/250000
training loss: 3.6196005172158 on iter: 147600/250000
training loss: 3.5455702707073 on iter: 147700/250000
training loss: 3.3196209329242 on iter: 147800/250000
training loss: 2.8261463214824 on iter: 147900/250000
training loss: 3.9695398131325 on iter: 148000/250000
training loss: 2.998646989314 on iter: 148100/250000
training loss: 3.6450998444261 on iter: 148200/250000
training loss: 3.9049331628748 on iter: 148300/250000
training loss: 3.2677483419368 on iter: 148400/250000
training loss: 3.7488585607414 on iter: 148500/250000
training loss: 2.9829777390947 on iter: 148600/250000
training loss: 3.1770194663953 on iter: 148700/250000
training loss: 3.7670830653738 on iter: 148800/250000
training loss: 2.8787791407409 on iter: 148900/250000
training loss: 2.627138036501 on iter: 149000/250000
training loss: 3.104836554779 on iter: 149100/250000
training loss: 3.0977577440407 on iter: 149200/250000
training loss: 3.3098799940108 on iter: 149300/250000
training loss: 3.2018525480899 on iter: 149400/250000
training loss: 3.116799331606 on iter: 149500/250000
training loss: 3.8153986554027 on iter: 149600/250000
training loss: 2.3688675078778 on iter: 149700/250000
training loss: 2.4481639711781 on iter: 149800/250000
training loss: 3.3289611649502 on iter: 149900/250000
training loss: 2.6820942528295 on iter: 150000/250000
training loss: 3.0607831468952 on iter: 150100/250000
training loss: 3.4245134213491 on iter: 150200/250000
training loss: 3.070733898796 on iter: 150300/250000
training loss: 3.0742896637608 on iter: 150400/250000
training loss: 4.348213477912 on iter: 150500/250000
training loss: 3.3022368311199 on iter: 150600/250000
training loss: 3.1886762518295 on iter: 150700/250000
training loss: 2.4466074601156 on iter: 150800/250000
training loss: 3.3768505314082 on iter: 150900/250000
training loss: 2.9133181946238 on iter: 151000/250000
training loss: 3.7082773489831 on iter: 151100/250000
training loss: 3.5814487056636 on iter: 151200/250000
training loss: 3.9250163931697 on iter: 151300/250000
training loss: 3.3814166842502 on iter: 151400/250000
training loss: 2.4153511387573 on iter: 151500/250000
training loss: 3.5124228944053 on iter: 151600/250000
training loss: 3.5509894414346 on iter: 151700/250000
training loss: 3.4243654932208 on iter: 151800/250000
training loss: 3.7824901449955 on iter: 151900/250000
training loss: 2.70460061095 on iter: 152000/250000
training loss: 3.7908020300417 on iter: 152100/250000
training loss: 2.545228071424 on iter: 152200/250000
training loss: 3.3593872247947 on iter: 152300/250000
training loss: 3.8141430770299 on iter: 152400/250000
training loss: 3.8705829449174 on iter: 152500/250000
training loss: 3.3059693379346 on iter: 152600/250000
training loss: 4.0987307360371 on iter: 152700/250000
training loss: 2.7219546917264 on iter: 152800/250000
training loss: 4.0061671290968 on iter: 152900/250000
training loss: 3.6345666990862 on iter: 153000/250000
training loss: 2.8139562268209 on iter: 153100/250000
training loss: 2.9054604251365 on iter: 153200/250000
training loss: 3.620965411465 on iter: 153300/250000
training loss: 3.6024873657349 on iter: 153400/250000
training loss: 2.4954608864773 on iter: 153500/250000
training loss: 3.0707411933274 on iter: 153600/250000
training loss: 2.4842435453268 on iter: 153700/250000
training loss: 2.7877206537208 on iter: 153800/250000
training loss: 4.1190213688942 on iter: 153900/250000
training loss: 3.2347349583776 on iter: 154000/250000
training loss: 3.8436815221285 on iter: 154100/250000
training loss: 2.9875341582531 on iter: 154200/250000
training loss: 3.3442520441665 on iter: 154300/250000
training loss: 3.9186023322323 on iter: 154400/250000
training loss: 3.0082523587586 on iter: 154500/250000
training loss: 3.2199461629209 on iter: 154600/250000
training loss: 3.2310908078338 on iter: 154700/250000
training loss: 3.4747050016139 on iter: 154800/250000
training loss: 2.8526512927841 on iter: 154900/250000
training loss: 2.8032243269509 on iter: 155000/250000
training loss: 3.0119246179228 on iter: 155100/250000
training loss: 3.7752784206578 on iter: 155200/250000
training loss: 3.4192405194535 on iter: 155300/250000
training loss: 3.4904720629103 on iter: 155400/250000
training loss: 3.0949976228346 on iter: 155500/250000
training loss: 2.8044013361141 on iter: 155600/250000
training loss: 3.0586006634914 on iter: 155700/250000
training loss: 3.9997278538708 on iter: 155800/250000
training loss: 3.6276117177468 on iter: 155900/250000
training loss: 3.2426290639576 on iter: 156000/250000
training loss: 3.0068004835744 on iter: 156100/250000
training loss: 4.163821400467 on iter: 156200/250000
training loss: 3.0775560833438 on iter: 156300/250000
training loss: 2.6607663557621 on iter: 156400/250000
training loss: 3.2381422520835 on iter: 156500/250000
training loss: 2.816412820393 on iter: 156600/250000
training loss: 2.8189127209158 on iter: 156700/250000
training loss: 2.985648725152 on iter: 156800/250000
training loss: 3.4849730574128 on iter: 156900/250000
training loss: 3.9749392862194 on iter: 157000/250000
training loss: 3.2401466853137 on iter: 157100/250000
training loss: 3.1725075481082 on iter: 157200/250000
training loss: 3.9853007932539 on iter: 157300/250000
training loss: 3.517997745513 on iter: 157400/250000
training loss: 4.0305093833841 on iter: 157500/250000
training loss: 2.7911196392414 on iter: 157600/250000
training loss: 2.9834341193546 on iter: 157700/250000
training loss: 2.5888619570617 on iter: 157800/250000
training loss: 3.6533723786989 on iter: 157900/250000
training loss: 3.1743237581732 on iter: 158000/250000
training loss: 3.6049578552151 on iter: 158100/250000
training loss: 3.250541007491 on iter: 158200/250000
training loss: 3.333781408373 on iter: 158300/250000
training loss: 3.3622503416261 on iter: 158400/250000
training loss: 3.4893663443477 on iter: 158500/250000
training loss: 3.5309156857349 on iter: 158600/250000
training loss: 2.2732091134497 on iter: 158700/250000
training loss: 2.7563147808289 on iter: 158800/250000
training loss: 3.490922276109 on iter: 158900/250000
training loss: 4.3368294970252 on iter: 159000/250000
training loss: 2.8346021675447 on iter: 159100/250000
training loss: 3.3447018562428 on iter: 159200/250000
training loss: 2.6847476924183 on iter: 159300/250000
training loss: 3.3012538687305 on iter: 159400/250000
training loss: 3.3309668321693 on iter: 159500/250000
training loss: 2.9772996591127 on iter: 159600/250000
training loss: 3.2962380582373 on iter: 159700/250000
training loss: 3.1194597618529 on iter: 159800/250000
training loss: 3.6618129802579 on iter: 159900/250000
training loss: 3.8225467138743 on iter: 160000/250000
training loss: 4.0985198057219 on iter: 160100/250000
training loss: 2.5350789183284 on iter: 160200/250000
training loss: 3.0533641969889 on iter: 160300/250000
training loss: 3.3009849188772 on iter: 160400/250000
training loss: 3.2646745612391 on iter: 160500/250000
training loss: 3.3856301700798 on iter: 160600/250000
training loss: 2.998268409486 on iter: 160700/250000
training loss: 4.2683353310906 on iter: 160800/250000
training loss: 2.9729091651401 on iter: 160900/250000
training loss: 3.5172062394297 on iter: 161000/250000
training loss: 2.6384827150942 on iter: 161100/250000
training loss: 3.3035328488424 on iter: 161200/250000
training loss: 3.7958870083835 on iter: 161300/250000
training loss: 3.191240660161 on iter: 161400/250000
training loss: 2.4005274480983 on iter: 161500/250000
training loss: 4.1488975312781 on iter: 161600/250000
training loss: 3.7946943817939 on iter: 161700/250000
training loss: 3.1830626854993 on iter: 161800/250000
training loss: 3.8508233094907 on iter: 161900/250000
training loss: 3.6254783239133 on iter: 162000/250000
training loss: 3.1602406190305 on iter: 162100/250000
training loss: 3.0620143808168 on iter: 162200/250000
training loss: 2.5085226316309 on iter: 162300/250000
training loss: 3.0063563873249 on iter: 162400/250000
training loss: 3.5135029658332 on iter: 162500/250000
training loss: 2.5423774991698 on iter: 162600/250000
training loss: 3.079401041362 on iter: 162700/250000
training loss: 3.4978784504222 on iter: 162800/250000
training loss: 2.6464151076187 on iter: 162900/250000
training loss: 3.1943084930769 on iter: 163000/250000
training loss: 2.492030760821 on iter: 163100/250000
training loss: 4.0659781837459 on iter: 163200/250000
training loss: 3.0007462549705 on iter: 163300/250000
training loss: 3.2024904233061 on iter: 163400/250000
training loss: 3.3256251274194 on iter: 163500/250000
training loss: 3.7234361036211 on iter: 163600/250000
training loss: 3.0087493730408 on iter: 163700/250000
training loss: 3.2873184054482 on iter: 163800/250000
training loss: 3.6798941581929 on iter: 163900/250000
training loss: 3.2799748850845 on iter: 164000/250000
training loss: 3.0901756290036 on iter: 164100/250000
training loss: 2.7977905779821 on iter: 164200/250000
training loss: 3.8320074443035 on iter: 164300/250000
training loss: 2.877100461522 on iter: 164400/250000
training loss: 2.8595031117602 on iter: 164500/250000
training loss: 3.5470190719298 on iter: 164600/250000
training loss: 3.1801239244905 on iter: 164700/250000
training loss: 3.0396888129822 on iter: 164800/250000
training loss: 3.6258320784952 on iter: 164900/250000
training loss: 3.2296592563881 on iter: 165000/250000
training loss: 3.9266565813146 on iter: 165100/250000
training loss: 4.5428740948527 on iter: 165200/250000
training loss: 3.8384737636263 on iter: 165300/250000
training loss: 3.9163358978196 on iter: 165400/250000
training loss: 3.7356552323153 on iter: 165500/250000
training loss: 3.3738775416571 on iter: 165600/250000
training loss: 3.7084871944261 on iter: 165700/250000
training loss: 2.326175564369 on iter: 165800/250000
training loss: 3.3668682677683 on iter: 165900/250000
training loss: 2.9829908073439 on iter: 166000/250000
training loss: 3.6697885373777 on iter: 166100/250000
training loss: 3.2274401419533 on iter: 166200/250000
training loss: 3.2893317918039 on iter: 166300/250000
training loss: 3.9494951471668 on iter: 166400/250000
training loss: 2.7244473452484 on iter: 166500/250000
training loss: 3.9406529573806 on iter: 166600/250000
training loss: 3.3278591584593 on iter: 166700/250000
training loss: 3.4810155079081 on iter: 166800/250000
training loss: 3.7670762219768 on iter: 166900/250000
training loss: 3.2529088007462 on iter: 167000/250000
training loss: 3.1554972788164 on iter: 167100/250000
training loss: 2.8530141493845 on iter: 167200/250000
training loss: 3.1980364841139 on iter: 167300/250000
training loss: 3.3957053744122 on iter: 167400/250000
training loss: 3.7906923267036 on iter: 167500/250000
training loss: 2.4758770581382 on iter: 167600/250000
training loss: 3.3095080966692 on iter: 167700/250000
training loss: 3.9847085145167 on iter: 167800/250000
training loss: 3.791669966005 on iter: 167900/250000
training loss: 3.631826249623 on iter: 168000/250000
training loss: 3.6477641487967 on iter: 168100/250000
training loss: 3.1967194430575 on iter: 168200/250000
training loss: 3.1208002331017 on iter: 168300/250000
training loss: 3.0556193963626 on iter: 168400/250000
training loss: 3.2603748227777 on iter: 168500/250000
training loss: 2.9447456448956 on iter: 168600/250000
training loss: 3.9116115261958 on iter: 168700/250000
training loss: 4.4160711240783 on iter: 168800/250000
training loss: 3.0820257902178 on iter: 168900/250000
training loss: 4.1800076906947 on iter: 169000/250000
training loss: 3.3766902196332 on iter: 169100/250000
training loss: 2.9622824085782 on iter: 169200/250000
training loss: 4.1416192118429 on iter: 169300/250000
training loss: 3.2855414204307 on iter: 169400/250000
training loss: 3.19794955935 on iter: 169500/250000
training loss: 3.0875521283135 on iter: 169600/250000
training loss: 3.5581603833111 on iter: 169700/250000
training loss: 3.356832237715 on iter: 169800/250000
training loss: 2.2607397837473 on iter: 169900/250000
training loss: 3.012138670622 on iter: 170000/250000
training loss: 3.0052869533487 on iter: 170100/250000
training loss: 2.9972818601628 on iter: 170200/250000
training loss: 3.30963587878 on iter: 170300/250000
training loss: 3.1256595452715 on iter: 170400/250000
training loss: 3.2675873882784 on iter: 170500/250000
training loss: 3.1640237275227 on iter: 170600/250000
training loss: 2.9378628659752 on iter: 170700/250000
training loss: 4.0803093690841 on iter: 170800/250000
training loss: 3.0350517954134 on iter: 170900/250000
training loss: 3.2087548741922 on iter: 171000/250000
training loss: 3.0197716760461 on iter: 171100/250000
training loss: 3.5763205492883 on iter: 171200/250000
training loss: 3.1744145119505 on iter: 171300/250000
training loss: 3.6512739709688 on iter: 171400/250000
training loss: 3.3487663372561 on iter: 171500/250000
training loss: 2.8485024193511 on iter: 171600/250000
training loss: 3.2916129883438 on iter: 171700/250000
training loss: 3.7544242177886 on iter: 171800/250000
training loss: 4.1423728694747 on iter: 171900/250000
training loss: 3.6597996174045 on iter: 172000/250000
training loss: 3.8330171511182 on iter: 172100/250000
training loss: 2.5409590120281 on iter: 172200/250000
training loss: 3.2394564642306 on iter: 172300/250000
training loss: 3.0073436112224 on iter: 172400/250000
training loss: 2.854917877588 on iter: 172500/250000
training loss: 2.9770712489505 on iter: 172600/250000
training loss: 3.7945737151333 on iter: 172700/250000
training loss: 3.4142991048277 on iter: 172800/250000
training loss: 3.6414206283548 on iter: 172900/250000
training loss: 2.7816283463621 on iter: 173000/250000
training loss: 3.2802634054239 on iter: 173100/250000
training loss: 2.6636813494736 on iter: 173200/250000
training loss: 2.9052731049064 on iter: 173300/250000
training loss: 2.6779268278943 on iter: 173400/250000
training loss: 2.7009283392038 on iter: 173500/250000
training loss: 3.7502681539577 on iter: 173600/250000
training loss: 3.6068506376066 on iter: 173700/250000
training loss: 3.0503216448276 on iter: 173800/250000
training loss: 3.5082201373714 on iter: 173900/250000
training loss: 2.7168241854334 on iter: 174000/250000
training loss: 3.9206609523658 on iter: 174100/250000
training loss: 3.2335726234015 on iter: 174200/250000
training loss: 3.4875214064293 on iter: 174300/250000
training loss: 3.7899906856703 on iter: 174400/250000
training loss: 3.4675463259705 on iter: 174500/250000
training loss: 3.2900486288846 on iter: 174600/250000
training loss: 3.7326905466079 on iter: 174700/250000
training loss: 4.0806913904512 on iter: 174800/250000
training loss: 2.8029150934777 on iter: 174900/250000
training loss: 4.1766157929898 on iter: 175000/250000
training loss: 3.5279769160077 on iter: 175100/250000
training loss: 2.8458204941031 on iter: 175200/250000
training loss: 4.657495226402 on iter: 175300/250000
training loss: 3.9607424387489 on iter: 175400/250000
training loss: 2.639569396188 on iter: 175500/250000
training loss: 2.725496540991 on iter: 175600/250000
training loss: 2.8874309130692 on iter: 175700/250000
training loss: 2.2406833067123 on iter: 175800/250000
training loss: 3.1331204676939 on iter: 175900/250000
training loss: 3.5064818352308 on iter: 176000/250000
training loss: 3.2323635663145 on iter: 176100/250000
training loss: 3.529775020824 on iter: 176200/250000
training loss: 3.1816688832826 on iter: 176300/250000
training loss: 4.4604013776001 on iter: 176400/250000
training loss: 3.4622607391277 on iter: 176500/250000
training loss: 2.6517923252925 on iter: 176600/250000
training loss: 2.9450801028843 on iter: 176700/250000
training loss: 3.3150310129003 on iter: 176800/250000
training loss: 2.8698416827085 on iter: 176900/250000
training loss: 2.8527610608405 on iter: 177000/250000
training loss: 2.9467016061158 on iter: 177100/250000
training loss: 3.9926597720799 on iter: 177200/250000
training loss: 3.5236885389842 on iter: 177300/250000
training loss: 2.8328734750151 on iter: 177400/250000
training loss: 2.8339074168963 on iter: 177500/250000
training loss: 2.7964308640468 on iter: 177600/250000
training loss: 3.0842754680359 on iter: 177700/250000
training loss: 3.0858119929374 on iter: 177800/250000
training loss: 3.2440029579086 on iter: 177900/250000
training loss: 3.3668127869295 on iter: 178000/250000
training loss: 4.5369000710414 on iter: 178100/250000
training loss: 3.2098186230172 on iter: 178200/250000
training loss: 3.0155959987727 on iter: 178300/250000
training loss: 3.9480312096388 on iter: 178400/250000
training loss: 2.6478225857652 on iter: 178500/250000
training loss: 2.6870524524874 on iter: 178600/250000
training loss: 3.2446098667354 on iter: 178700/250000
training loss: 3.0942768252838 on iter: 178800/250000
training loss: 3.5349708294526 on iter: 178900/250000
training loss: 4.1046860529614 on iter: 179000/250000
training loss: 3.1975469362817 on iter: 179100/250000
training loss: 3.9773017448813 on iter: 179200/250000
training loss: 2.8578554575507 on iter: 179300/250000
training loss: 3.0631132937809 on iter: 179400/250000
training loss: 4.4499042711499 on iter: 179500/250000
training loss: 2.7196180541186 on iter: 179600/250000
training loss: 4.1486635303348 on iter: 179700/250000
training loss: 3.4523280993576 on iter: 179800/250000
training loss: 3.8501329371178 on iter: 179900/250000
training loss: 3.6094786023012 on iter: 180000/250000
training loss: 2.8132121628816 on iter: 180100/250000
training loss: 3.3951695238425 on iter: 180200/250000
training loss: 3.9786175152053 on iter: 180300/250000
training loss: 3.2832153599932 on iter: 180400/250000
training loss: 2.7363903040423 on iter: 180500/250000
training loss: 2.9722434718491 on iter: 180600/250000
training loss: 3.97066869845 on iter: 180700/250000
training loss: 3.1680701677896 on iter: 180800/250000
training loss: 3.3530159216419 on iter: 180900/250000
training loss: 2.742875514431 on iter: 181000/250000
training loss: 3.6049398416167 on iter: 181100/250000
training loss: 2.4815385842645 on iter: 181200/250000
training loss: 4.1660899621742 on iter: 181300/250000
training loss: 3.3710964750502 on iter: 181400/250000
training loss: 3.1074729863665 on iter: 181500/250000
training loss: 3.5542525072855 on iter: 181600/250000
training loss: 2.5227938642628 on iter: 181700/250000
training loss: 3.3142566638187 on iter: 181800/250000
training loss: 3.7891994373824 on iter: 181900/250000
training loss: 3.3776476313392 on iter: 182000/250000
training loss: 3.2372757722908 on iter: 182100/250000
training loss: 3.4306129653224 on iter: 182200/250000
training loss: 3.0434690584287 on iter: 182300/250000
training loss: 3.4935588564698 on iter: 182400/250000
training loss: 3.325774801044 on iter: 182500/250000
training loss: 3.4627885077786 on iter: 182600/250000
training loss: 3.1993755427557 on iter: 182700/250000
training loss: 3.5917627106992 on iter: 182800/250000
training loss: 3.8020505152604 on iter: 182900/250000
training loss: 3.249858175914 on iter: 183000/250000
training loss: 3.9869915329735 on iter: 183100/250000
training loss: 3.3847562258955 on iter: 183200/250000
training loss: 3.5059250671899 on iter: 183300/250000
training loss: 3.5921702067591 on iter: 183400/250000
training loss: 3.831356668824 on iter: 183500/250000
training loss: 4.270268980386 on iter: 183600/250000
training loss: 3.5694432468422 on iter: 183700/250000
training loss: 3.535141970246 on iter: 183800/250000
training loss: 3.5894423783631 on iter: 183900/250000
training loss: 3.4285501089487 on iter: 184000/250000
training loss: 2.6027282928234 on iter: 184100/250000
training loss: 3.0035990358577 on iter: 184200/250000
training loss: 3.5970740144532 on iter: 184300/250000
training loss: 3.1070394937466 on iter: 184400/250000
training loss: 3.8862870442421 on iter: 184500/250000
training loss: 3.9598951981425 on iter: 184600/250000
training loss: 4.3554684450926 on iter: 184700/250000
training loss: 2.8432400574828 on iter: 184800/250000
training loss: 3.090110544674 on iter: 184900/250000
training loss: 2.1767477429804 on iter: 185000/250000
training loss: 3.2489897950407 on iter: 185100/250000
training loss: 4.0155852393084 on iter: 185200/250000
training loss: 3.7186724243371 on iter: 185300/250000
training loss: 3.2018411314786 on iter: 185400/250000
training loss: 3.7734615501648 on iter: 185500/250000
training loss: 3.5515765813853 on iter: 185600/250000
training loss: 2.8843863950744 on iter: 185700/250000
training loss: 4.4683158216292 on iter: 185800/250000
training loss: 3.5147084822821 on iter: 185900/250000
training loss: 3.2077882199152 on iter: 186000/250000
training loss: 3.1507280260406 on iter: 186100/250000
training loss: 3.2913279292982 on iter: 186200/250000
training loss: 2.774737798163 on iter: 186300/250000
training loss: 3.3798584811003 on iter: 186400/250000
training loss: 4.5493890190981 on iter: 186500/250000
training loss: 3.1446530607928 on iter: 186600/250000
training loss: 3.7042757095056 on iter: 186700/250000
training loss: 3.6091796806887 on iter: 186800/250000
training loss: 3.2117202664722 on iter: 186900/250000
training loss: 3.3629551182687 on iter: 187000/250000
training loss: 3.890511569745 on iter: 187100/250000
training loss: 3.5534612320308 on iter: 187200/250000
training loss: 3.7414096726285 on iter: 187300/250000
training loss: 3.1913204439832 on iter: 187400/250000
training loss: 3.577087688938 on iter: 187500/250000
training loss: 3.764693343594 on iter: 187600/250000
training loss: 3.027174171654 on iter: 187700/250000
training loss: 3.4209680641656 on iter: 187800/250000
training loss: 4.1318910750848 on iter: 187900/250000
training loss: 2.6760001996168 on iter: 188000/250000
training loss: 4.0416199530484 on iter: 188100/250000
training loss: 3.7187834112101 on iter: 188200/250000
training loss: 2.9420892931181 on iter: 188300/250000
training loss: 3.8980671724927 on iter: 188400/250000
training loss: 2.9705569007458 on iter: 188500/250000
training loss: 3.3465413557369 on iter: 188600/250000
training loss: 3.5059940748237 on iter: 188700/250000
training loss: 3.0960621544739 on iter: 188800/250000
training loss: 3.8515676299955 on iter: 188900/250000
training loss: 3.9243032422231 on iter: 189000/250000
training loss: 2.9268862489172 on iter: 189100/250000
training loss: 3.6832015292702 on iter: 189200/250000
training loss: 3.6100792085948 on iter: 189300/250000
training loss: 3.3750878825255 on iter: 189400/250000
training loss: 2.984078175535 on iter: 189500/250000
training loss: 2.5185274118705 on iter: 189600/250000
training loss: 3.2501511997224 on iter: 189700/250000
training loss: 3.1595284744119 on iter: 189800/250000
training loss: 3.3899814154784 on iter: 189900/250000
training loss: 3.2801970242773 on iter: 190000/250000
training loss: 3.2721200978757 on iter: 190100/250000
training loss: 3.634613987388 on iter: 190200/250000
training loss: 2.2128118609204 on iter: 190300/250000
training loss: 2.9541574453495 on iter: 190400/250000
training loss: 3.2195240000326 on iter: 190500/250000
training loss: 3.0495938270636 on iter: 190600/250000
training loss: 2.8472899624661 on iter: 190700/250000
training loss: 2.7708336129242 on iter: 190800/250000
training loss: 3.5072151456306 on iter: 190900/250000
training loss: 3.1158744870388 on iter: 191000/250000
training loss: 3.0393850600012 on iter: 191100/250000
training loss: 3.6690639231667 on iter: 191200/250000
training loss: 3.4168339047725 on iter: 191300/250000
training loss: 3.7733985597606 on iter: 191400/250000
training loss: 3.583808390611 on iter: 191500/250000
training loss: 3.2150354173204 on iter: 191600/250000
training loss: 3.7095507556186 on iter: 191700/250000
training loss: 3.7423333222687 on iter: 191800/250000
training loss: 4.3524495373718 on iter: 191900/250000
training loss: 2.9298455535417 on iter: 192000/250000
training loss: 2.6727009261834 on iter: 192100/250000
training loss: 3.4970012032011 on iter: 192200/250000
training loss: 3.0835584415709 on iter: 192300/250000
training loss: 3.8224519146004 on iter: 192400/250000
training loss: 3.5229836726403 on iter: 192500/250000
training loss: 3.6529231183439 on iter: 192600/250000
training loss: 2.0804514463896 on iter: 192700/250000
training loss: 3.2388531369815 on iter: 192800/250000
training loss: 3.2457003573215 on iter: 192900/250000
training loss: 3.6300271943397 on iter: 193000/250000
training loss: 2.8738532315635 on iter: 193100/250000
training loss: 3.6250352151961 on iter: 193200/250000
training loss: 2.5943285112184 on iter: 193300/250000
training loss: 3.1983324442881 on iter: 193400/250000
training loss: 3.4394147747909 on iter: 193500/250000
training loss: 3.8370366845108 on iter: 193600/250000
training loss: 3.3140125642031 on iter: 193700/250000
training loss: 3.5292269223728 on iter: 193800/250000
training loss: 2.8979827759652 on iter: 193900/250000
training loss: 4.3946051535823 on iter: 194000/250000
training loss: 3.1423815378344 on iter: 194100/250000
training loss: 2.9360367303536 on iter: 194200/250000
training loss: 3.6264965200168 on iter: 194300/250000
training loss: 2.6906737282023 on iter: 194400/250000
training loss: 3.0727200472449 on iter: 194500/250000
training loss: 5.0882565890174 on iter: 194600/250000
training loss: 3.0859958731539 on iter: 194700/250000
training loss: 2.8725108041677 on iter: 194800/250000
training loss: 3.568971643687 on iter: 194900/250000
training loss: 2.6593899357719 on iter: 195000/250000
training loss: 3.0492586741029 on iter: 195100/250000
training loss: 2.9650012282307 on iter: 195200/250000
training loss: 2.9788860462511 on iter: 195300/250000
training loss: 3.1361982522137 on iter: 195400/250000
training loss: 3.2364943347137 on iter: 195500/250000
training loss: 2.8235650220695 on iter: 195600/250000
training loss: 3.0616675571789 on iter: 195700/250000
training loss: 3.5136865286039 on iter: 195800/250000
training loss: 2.8860581063603 on iter: 195900/250000
training loss: 2.8440187504655 on iter: 196000/250000
training loss: 4.1033366118675 on iter: 196100/250000
training loss: 2.5589651902623 on iter: 196200/250000
training loss: 4.0509887807897 on iter: 196300/250000
training loss: 2.6688309068091 on iter: 196400/250000
training loss: 3.4156312374844 on iter: 196500/250000
training loss: 3.2277542245668 on iter: 196600/250000
training loss: 3.5142702775067 on iter: 196700/250000
training loss: 3.0632344132347 on iter: 196800/250000
training loss: 3.205910187204 on iter: 196900/250000
training loss: 3.9556479448592 on iter: 197000/250000
training loss: 2.7308215369008 on iter: 197100/250000
training loss: 4.2521735586197 on iter: 197200/250000
training loss: 2.146871686188 on iter: 197300/250000
training loss: 3.0431597164748 on iter: 197400/250000
training loss: 2.5083544666906 on iter: 197500/250000
training loss: 3.0375575812679 on iter: 197600/250000
training loss: 4.1960659814092 on iter: 197700/250000
training loss: 3.3974248656745 on iter: 197800/250000
training loss: 2.6915639229319 on iter: 197900/250000
training loss: 3.7355879648571 on iter: 198000/250000
training loss: 2.6893548496009 on iter: 198100/250000
training loss: 3.7951106546295 on iter: 198200/250000
training loss: 3.7886224547856 on iter: 198300/250000
training loss: 2.5845880819178 on iter: 198400/250000
training loss: 3.6002165937323 on iter: 198500/250000
training loss: 4.0323907510152 on iter: 198600/250000
training loss: 2.7429553439837 on iter: 198700/250000
training loss: 3.0240744950675 on iter: 198800/250000
training loss: 3.0878260071197 on iter: 198900/250000
training loss: 3.4168752605147 on iter: 199000/250000
training loss: 3.6896652133397 on iter: 199100/250000
training loss: 3.7967471595426 on iter: 199200/250000
training loss: 2.7729516573546 on iter: 199300/250000
training loss: 4.2997581535026 on iter: 199400/250000
training loss: 4.6309462571105 on iter: 199500/250000
training loss: 3.5261020331198 on iter: 199600/250000
training loss: 4.0680027052183 on iter: 199700/250000
training loss: 3.0566835260618 on iter: 199800/250000
training loss: 4.4284821975396 on iter: 199900/250000
training loss: 2.9126309686831 on iter: 200000/250000
training loss: 3.351641394324 on iter: 200100/250000
training loss: 3.3054212086528 on iter: 200200/250000
training loss: 2.3929050444014 on iter: 200300/250000
training loss: 3.4648891579699 on iter: 200400/250000
training loss: 3.3569449122955 on iter: 200500/250000
training loss: 4.1744726714015 on iter: 200600/250000
training loss: 3.0970493306862 on iter: 200700/250000
training loss: 3.000000337409 on iter: 200800/250000
training loss: 2.6247448805613 on iter: 200900/250000
training loss: 2.8764553193177 on iter: 201000/250000
training loss: 3.0649521065505 on iter: 201100/250000
training loss: 4.0312409536202 on iter: 201200/250000
training loss: 2.9623110423785 on iter: 201300/250000
training loss: 3.4866195226666 on iter: 201400/250000
training loss: 3.7082204493991 on iter: 201500/250000
training loss: 3.0733613578741 on iter: 201600/250000
training loss: 3.8991338211026 on iter: 201700/250000
training loss: 4.2002073066306 on iter: 201800/250000
training loss: 3.2801772392552 on iter: 201900/250000
training loss: 3.9594825350505 on iter: 202000/250000
training loss: 3.2485890381056 on iter: 202100/250000
training loss: 4.2556018498545 on iter: 202200/250000
training loss: 3.5243606675121 on iter: 202300/250000
training loss: 3.2222084632249 on iter: 202400/250000
training loss: 2.9771933139995 on iter: 202500/250000
training loss: 3.5370324173047 on iter: 202600/250000
training loss: 2.75729111514 on iter: 202700/250000
training loss: 2.5616705042401 on iter: 202800/250000
training loss: 2.7533123180122 on iter: 202900/250000
training loss: 2.9146491275574 on iter: 203000/250000
training loss: 3.5738552339389 on iter: 203100/250000
training loss: 3.0633965005068 on iter: 203200/250000
training loss: 3.8956071874906 on iter: 203300/250000
training loss: 3.0987018115638 on iter: 203400/250000
training loss: 2.879664350378 on iter: 203500/250000
training loss: 2.516196224322 on iter: 203600/250000
training loss: 3.1965457333379 on iter: 203700/250000
training loss: 4.2719721252 on iter: 203800/250000
training loss: 3.092979355228 on iter: 203900/250000
training loss: 3.3474333530443 on iter: 204000/250000
training loss: 2.8867592294165 on iter: 204100/250000
training loss: 2.7508175702944 on iter: 204200/250000
training loss: 2.6684566612715 on iter: 204300/250000
training loss: 3.5202730445125 on iter: 204400/250000
training loss: 3.6204192770763 on iter: 204500/250000
training loss: 3.256013060085 on iter: 204600/250000
training loss: 3.120967985452 on iter: 204700/250000
training loss: 3.1899345853165 on iter: 204800/250000
training loss: 2.697922520606 on iter: 204900/250000
training loss: 3.3943842113098 on iter: 205000/250000
training loss: 3.1227835583482 on iter: 205100/250000
training loss: 3.156265533015 on iter: 205200/250000
training loss: 3.6096825861942 on iter: 205300/250000
training loss: 3.708707083639 on iter: 205400/250000
training loss: 3.0356206271694 on iter: 205500/250000
training loss: 3.4962143783591 on iter: 205600/250000
training loss: 2.7437945225649 on iter: 205700/250000
training loss: 3.2797737710193 on iter: 205800/250000
training loss: 3.3470197263849 on iter: 205900/250000
training loss: 3.3013066397889 on iter: 206000/250000
training loss: 2.6434702474693 on iter: 206100/250000
training loss: 3.2301745125776 on iter: 206200/250000
training loss: 3.1322480510125 on iter: 206300/250000
training loss: 3.1441778707504 on iter: 206400/250000
training loss: 3.1030859863157 on iter: 206500/250000
training loss: 3.3041997565538 on iter: 206600/250000
training loss: 3.8846121672078 on iter: 206700/250000
training loss: 2.6529793066419 on iter: 206800/250000
training loss: 2.7940156554096 on iter: 206900/250000
training loss: 3.883012762981 on iter: 207000/250000
training loss: 3.8133797191914 on iter: 207100/250000
training loss: 3.2134104536808 on iter: 207200/250000
training loss: 2.6035322454719 on iter: 207300/250000
training loss: 4.1970773150279 on iter: 207400/250000
training loss: 3.5925860597322 on iter: 207500/250000
training loss: 2.8784071293897 on iter: 207600/250000
training loss: 2.9748073342708 on iter: 207700/250000
training loss: 3.0926378106156 on iter: 207800/250000
training loss: 3.2794433888497 on iter: 207900/250000
training loss: 3.796577161224 on iter: 208000/250000
training loss: 3.7102427563432 on iter: 208100/250000
training loss: 3.4687503541161 on iter: 208200/250000
training loss: 2.8137530450802 on iter: 208300/250000
training loss: 5.0068906004882 on iter: 208400/250000
training loss: 4.3736321585625 on iter: 208500/250000
training loss: 2.7158244032623 on iter: 208600/250000
training loss: 2.814417637705 on iter: 208700/250000
training loss: 2.779987679188 on iter: 208800/250000
training loss: 3.4255362236851 on iter: 208900/250000
training loss: 3.5369139974966 on iter: 209000/250000
training loss: 3.4414253994293 on iter: 209100/250000
training loss: 2.641281492927 on iter: 209200/250000
training loss: 3.3758986444434 on iter: 209300/250000
training loss: 3.0226607286229 on iter: 209400/250000
training loss: 4.593721587884 on iter: 209500/250000
training loss: 3.0680501137223 on iter: 209600/250000
training loss: 2.9885168952455 on iter: 209700/250000
training loss: 4.2841778529475 on iter: 209800/250000
training loss: 4.3540262040401 on iter: 209900/250000
training loss: 3.6254452785746 on iter: 210000/250000
training loss: 2.6809128896008 on iter: 210100/250000
training loss: 3.6041844968411 on iter: 210200/250000
training loss: 2.8189605841977 on iter: 210300/250000
training loss: 2.948852268713 on iter: 210400/250000
training loss: 4.4853512016123 on iter: 210500/250000
training loss: 3.2458737635985 on iter: 210600/250000
training loss: 2.7435595322724 on iter: 210700/250000
training loss: 4.6657076798456 on iter: 210800/250000
training loss: 3.3064939360191 on iter: 210900/250000
training loss: 3.1474221241033 on iter: 211000/250000
training loss: 3.7298125449449 on iter: 211100/250000
training loss: 3.5112845629393 on iter: 211200/250000
training loss: 3.4565726379227 on iter: 211300/250000
training loss: 4.3954558092315 on iter: 211400/250000
training loss: 3.0779085979703 on iter: 211500/250000
training loss: 3.0605861504257 on iter: 211600/250000
training loss: 2.5332512256714 on iter: 211700/250000
training loss: 3.52518711163 on iter: 211800/250000
training loss: 3.635743266739 on iter: 211900/250000
training loss: 3.1572965575785 on iter: 212000/250000
training loss: 3.2037947877165 on iter: 212100/250000
training loss: 3.3827679976458 on iter: 212200/250000
training loss: 3.6383198953587 on iter: 212300/250000
training loss: 3.0592376927578 on iter: 212400/250000
training loss: 3.4922691442832 on iter: 212500/250000
training loss: 2.5299495525318 on iter: 212600/250000
training loss: 2.7872933544963 on iter: 212700/250000
training loss: 3.1332149311342 on iter: 212800/250000
training loss: 2.4013076351832 on iter: 212900/250000
training loss: 3.6926803023591 on iter: 213000/250000
training loss: 3.241899661021 on iter: 213100/250000
training loss: 2.9590252015405 on iter: 213200/250000
training loss: 3.7422842685932 on iter: 213300/250000
training loss: 3.6614987553137 on iter: 213400/250000
training loss: 3.1842950733473 on iter: 213500/250000
training loss: 3.9943871748408 on iter: 213600/250000
training loss: 2.8259809487528 on iter: 213700/250000
training loss: 3.5620814390849 on iter: 213800/250000
training loss: 3.074873192661 on iter: 213900/250000
training loss: 3.7754133851657 on iter: 214000/250000
training loss: 3.5226575734516 on iter: 214100/250000
training loss: 3.512267209835 on iter: 214200/250000
training loss: 2.9905809555751 on iter: 214300/250000
training loss: 2.7491155172616 on iter: 214400/250000
training loss: 3.1796475847584 on iter: 214500/250000
training loss: 3.6126656716861 on iter: 214600/250000
training loss: 3.2373759507066 on iter: 214700/250000
training loss: 2.814663950393 on iter: 214800/250000
training loss: 2.5253820276964 on iter: 214900/250000
training loss: 3.9956130076467 on iter: 215000/250000
training loss: 3.3744751360677 on iter: 215100/250000
training loss: 3.3037867589759 on iter: 215200/250000
training loss: 3.4443564046137 on iter: 215300/250000
training loss: 3.2638110997102 on iter: 215400/250000
training loss: 2.807528032694 on iter: 215500/250000
training loss: 3.0949010124679 on iter: 215600/250000
training loss: 3.842152613962 on iter: 215700/250000
training loss: 3.0678621132929 on iter: 215800/250000
training loss: 4.1266916872136 on iter: 215900/250000
training loss: 2.879065661241 on iter: 216000/250000
training loss: 3.5817074173526 on iter: 216100/250000
training loss: 3.5515093432905 on iter: 216200/250000
training loss: 4.8637714155316 on iter: 216300/250000
training loss: 2.8314491077641 on iter: 216400/250000
training loss: 2.9582269859409 on iter: 216500/250000
training loss: 2.9111766295417 on iter: 216600/250000
training loss: 2.9880033001092 on iter: 216700/250000
training loss: 2.597067957825 on iter: 216800/250000
training loss: 3.4660566767691 on iter: 216900/250000
training loss: 3.2157060090685 on iter: 217000/250000
training loss: 3.5371842208427 on iter: 217100/250000
training loss: 3.3636287382076 on iter: 217200/250000
training loss: 3.1207191423389 on iter: 217300/250000
training loss: 3.294382756039 on iter: 217400/250000
training loss: 3.0790578834551 on iter: 217500/250000
training loss: 3.0425369379921 on iter: 217600/250000
training loss: 3.5591209035656 on iter: 217700/250000
training loss: 2.9041111553089 on iter: 217800/250000
training loss: 2.7738214426491 on iter: 217900/250000
training loss: 3.3482408962377 on iter: 218000/250000
training loss: 2.9569447984296 on iter: 218100/250000
training loss: 3.8216991850676 on iter: 218200/250000
training loss: 2.6235074558824 on iter: 218300/250000
training loss: 3.57485228128 on iter: 218400/250000
training loss: 3.6831684965608 on iter: 218500/250000
training loss: 4.1405506017543 on iter: 218600/250000
training loss: 2.6017984112244 on iter: 218700/250000
training loss: 3.5454014143638 on iter: 218800/250000
training loss: 2.931231927469 on iter: 218900/250000
training loss: 2.6875929379737 on iter: 219000/250000
training loss: 2.5844932372896 on iter: 219100/250000
training loss: 2.955589432662 on iter: 219200/250000
training loss: 4.3630518483686 on iter: 219300/250000
training loss: 4.289869096856 on iter: 219400/250000
training loss: 3.7261543459759 on iter: 219500/250000
training loss: 2.5109145161321 on iter: 219600/250000
training loss: 3.4663756662169 on iter: 219700/250000
training loss: 4.0655941473606 on iter: 219800/250000
training loss: 4.2542882980936 on iter: 219900/250000
training loss: 3.0070757935457 on iter: 220000/250000
training loss: 3.1886255738637 on iter: 220100/250000
training loss: 3.1511766644659 on iter: 220200/250000
training loss: 3.0523966257189 on iter: 220300/250000
training loss: 2.8613804084147 on iter: 220400/250000
training loss: 3.216568740588 on iter: 220500/250000
training loss: 3.9544976088451 on iter: 220600/250000
training loss: 3.0742079724179 on iter: 220700/250000
training loss: 2.8354512968181 on iter: 220800/250000
training loss: 3.9907652554911 on iter: 220900/250000
training loss: 3.8050289807468 on iter: 221000/250000
training loss: 3.6777420162529 on iter: 221100/250000
training loss: 2.8283736719971 on iter: 221200/250000
training loss: 3.4954236145294 on iter: 221300/250000
training loss: 3.581932319003 on iter: 221400/250000
training loss: 3.0489711462645 on iter: 221500/250000
training loss: 2.9265023072035 on iter: 221600/250000
training loss: 3.3436137484663 on iter: 221700/250000
training loss: 3.5895875830308 on iter: 221800/250000
training loss: 3.3839665123859 on iter: 221900/250000
training loss: 3.4691574605435 on iter: 222000/250000
training loss: 3.3174586117528 on iter: 222100/250000
training loss: 2.8246656383313 on iter: 222200/250000
training loss: 3.1925586941112 on iter: 222300/250000
training loss: 3.6582393763772 on iter: 222400/250000
training loss: 3.7972615623702 on iter: 222500/250000
training loss: 3.2381564975489 on iter: 222600/250000
training loss: 3.8989031738233 on iter: 222700/250000
training loss: 3.5191563989311 on iter: 222800/250000
training loss: 2.5815223399919 on iter: 222900/250000
training loss: 3.7172135003291 on iter: 223000/250000
training loss: 2.7821113596417 on iter: 223100/250000
training loss: 2.9979454704795 on iter: 223200/250000
training loss: 3.765663913583 on iter: 223300/250000
training loss: 2.5135384219555 on iter: 223400/250000
training loss: 2.8082504966493 on iter: 223500/250000
training loss: 3.4098591001589 on iter: 223600/250000
training loss: 3.2159528340958 on iter: 223700/250000
training loss: 3.4136044515675 on iter: 223800/250000
training loss: 2.9928283587134 on iter: 223900/250000
training loss: 3.6579947074854 on iter: 224000/250000
training loss: 2.6873840253849 on iter: 224100/250000
training loss: 2.880751792511 on iter: 224200/250000
training loss: 3.1706758928553 on iter: 224300/250000
training loss: 3.0682432823849 on iter: 224400/250000
training loss: 2.4947320641459 on iter: 224500/250000
training loss: 3.4208626747147 on iter: 224600/250000
training loss: 3.1247349091988 on iter: 224700/250000
training loss: 3.6815280757223 on iter: 224800/250000
training loss: 3.3348127158578 on iter: 224900/250000
training loss: 3.1505004567567 on iter: 225000/250000
training loss: 2.9225184516139 on iter: 225100/250000
training loss: 3.2068526098038 on iter: 225200/250000
training loss: 2.9418847600635 on iter: 225300/250000
training loss: 3.3607732053323 on iter: 225400/250000
training loss: 3.372569974706 on iter: 225500/250000
training loss: 3.6629385493336 on iter: 225600/250000
training loss: 3.3420996130371 on iter: 225700/250000
training loss: 4.2094248151562 on iter: 225800/250000
training loss: 3.2119100965526 on iter: 225900/250000
training loss: 4.1322824917841 on iter: 226000/250000
training loss: 3.5741321489846 on iter: 226100/250000
training loss: 3.2189066597623 on iter: 226200/250000
training loss: 3.4872401137293 on iter: 226300/250000
training loss: 4.012950868165 on iter: 226400/250000
training loss: 3.5865417636435 on iter: 226500/250000
training loss: 3.8532376557863 on iter: 226600/250000
training loss: 3.4314967772185 on iter: 226700/250000
training loss: 3.5951985000759 on iter: 226800/250000
training loss: 3.2831555659313 on iter: 226900/250000
training loss: 3.6745685092604 on iter: 227000/250000
training loss: 2.7358940222734 on iter: 227100/250000
training loss: 2.4664015872146 on iter: 227200/250000
training loss: 3.8491360099566 on iter: 227300/250000
training loss: 3.2391225559859 on iter: 227400/250000
training loss: 3.3945432521114 on iter: 227500/250000
training loss: 3.7368247678035 on iter: 227600/250000
training loss: 3.3697466497696 on iter: 227700/250000
training loss: 3.0387944546217 on iter: 227800/250000
training loss: 3.3810351326111 on iter: 227900/250000
training loss: 2.8376828012934 on iter: 228000/250000
training loss: 3.6390891052394 on iter: 228100/250000
training loss: 2.9516130710053 on iter: 228200/250000
training loss: 3.6456545871958 on iter: 228300/250000
training loss: 3.9499589654162 on iter: 228400/250000
training loss: 3.5189815808959 on iter: 228500/250000
training loss: 4.4339432034949 on iter: 228600/250000
training loss: 2.297319741252 on iter: 228700/250000
training loss: 2.7826563128914 on iter: 228800/250000
training loss: 3.4437348084735 on iter: 228900/250000
training loss: 3.6128828556407 on iter: 229000/250000
training loss: 2.9108622581562 on iter: 229100/250000
training loss: 2.8785969192383 on iter: 229200/250000
training loss: 3.7133206820574 on iter: 229300/250000
training loss: 3.4580622546339 on iter: 229400/250000
training loss: 3.9072923281895 on iter: 229500/250000
training loss: 3.5659227195912 on iter: 229600/250000
training loss: 4.0292480142218 on iter: 229700/250000
training loss: 4.1581309199301 on iter: 229800/250000
training loss: 3.3826501675195 on iter: 229900/250000
training loss: 3.0806833538143 on iter: 230000/250000
training loss: 2.8901011168001 on iter: 230100/250000
training loss: 3.3667216195878 on iter: 230200/250000
training loss: 3.1666434136145 on iter: 230300/250000
training loss: 3.978565824571 on iter: 230400/250000
training loss: 3.1341873542199 on iter: 230500/250000
training loss: 3.176825291996 on iter: 230600/250000
training loss: 2.9680040427514 on iter: 230700/250000
training loss: 2.6503080650749 on iter: 230800/250000
training loss: 4.3726788810639 on iter: 230900/250000
training loss: 3.4915072827225 on iter: 231000/250000
training loss: 3.2226580752709 on iter: 231100/250000
training loss: 2.6147277752367 on iter: 231200/250000
training loss: 2.9207842726833 on iter: 231300/250000
training loss: 3.2695706764066 on iter: 231400/250000
training loss: 3.3827275637753 on iter: 231500/250000
training loss: 3.5233924240502 on iter: 231600/250000
training loss: 2.8501660024453 on iter: 231700/250000
training loss: 3.4673975070218 on iter: 231800/250000
training loss: 3.3414797968176 on iter: 231900/250000
training loss: 2.8149688008611 on iter: 232000/250000
training loss: 3.002707112906 on iter: 232100/250000
training loss: 2.5289504120017 on iter: 232200/250000
training loss: 3.619335642313 on iter: 232300/250000
training loss: 2.8671842770037 on iter: 232400/250000
training loss: 3.4106930815215 on iter: 232500/250000
training loss: 2.7828853545754 on iter: 232600/250000
training loss: 3.3791878899697 on iter: 232700/250000
training loss: 3.4135990554901 on iter: 232800/250000
training loss: 3.7615660899152 on iter: 232900/250000
training loss: 3.5780240326261 on iter: 233000/250000
training loss: 3.4550931951008 on iter: 233100/250000
training loss: 2.9831703189695 on iter: 233200/250000
training loss: 2.79262090438 on iter: 233300/250000
training loss: 3.1644804539251 on iter: 233400/250000
training loss: 2.9000526017791 on iter: 233500/250000
training loss: 4.7649635484205 on iter: 233600/250000
training loss: 2.8428451949131 on iter: 233700/250000
training loss: 3.0148724566879 on iter: 233800/250000
training loss: 2.8972015395283 on iter: 233900/250000
training loss: 2.928774301016 on iter: 234000/250000
training loss: 3.0296405682198 on iter: 234100/250000
training loss: 3.6426623827765 on iter: 234200/250000
training loss: 2.8711173515687 on iter: 234300/250000
training loss: 2.6064220378639 on iter: 234400/250000
training loss: 4.7338543316311 on iter: 234500/250000
training loss: 3.1066039066028 on iter: 234600/250000
training loss: 3.92324208577 on iter: 234700/250000
training loss: 2.6346543970682 on iter: 234800/250000
training loss: 2.7444490404421 on iter: 234900/250000
training loss: 3.3968233480404 on iter: 235000/250000
training loss: 3.8141486141069 on iter: 235100/250000
training loss: 3.2119260878367 on iter: 235200/250000
training loss: 2.9417580083167 on iter: 235300/250000
training loss: 3.0133136230627 on iter: 235400/250000
training loss: 2.8252717735656 on iter: 235500/250000
training loss: 3.5679855400966 on iter: 235600/250000
training loss: 3.3930188022491 on iter: 235700/250000
training loss: 3.4367335640756 on iter: 235800/250000
training loss: 2.6398262744272 on iter: 235900/250000
training loss: 4.0151133505136 on iter: 236000/250000
training loss: 3.7133061356954 on iter: 236100/250000
training loss: 3.005443831697 on iter: 236200/250000
training loss: 4.0458972046587 on iter: 236300/250000
training loss: 3.4784401605404 on iter: 236400/250000
training loss: 3.1938674183633 on iter: 236500/250000
training loss: 3.5501222511153 on iter: 236600/250000
training loss: 3.2035639186824 on iter: 236700/250000
training loss: 3.3158741225031 on iter: 236800/250000
training loss: 3.3470069989547 on iter: 236900/250000
training loss: 3.4281109723304 on iter: 237000/250000
training loss: 3.0539004436204 on iter: 237100/250000
training loss: 3.3954331016312 on iter: 237200/250000
training loss: 2.5774881487465 on iter: 237300/250000
training loss: 4.2606322342452 on iter: 237400/250000
training loss: 2.5696639974217 on iter: 237500/250000
training loss: 3.3480157371124 on iter: 237600/250000
training loss: 3.1549735440941 on iter: 237700/250000
training loss: 2.847922325064 on iter: 237800/250000
training loss: 3.9185762101662 on iter: 237900/250000
training loss: 3.6677903842627 on iter: 238000/250000
training loss: 3.3569961030461 on iter: 238100/250000
training loss: 3.504187315386 on iter: 238200/250000
training loss: 4.1127445799724 on iter: 238300/250000
training loss: 4.1683341346539 on iter: 238400/250000
training loss: 3.4977589346153 on iter: 238500/250000
training loss: 2.066816525935 on iter: 238600/250000
training loss: 3.0200398406999 on iter: 238700/250000
training loss: 3.31946004064 on iter: 238800/250000
training loss: 2.7271496894511 on iter: 238900/250000
training loss: 3.644592804465 on iter: 239000/250000
training loss: 3.7864456073965 on iter: 239100/250000
training loss: 3.2694869096411 on iter: 239200/250000
training loss: 3.1547023358138 on iter: 239300/250000
training loss: 3.4092435239767 on iter: 239400/250000
training loss: 3.5643791225801 on iter: 239500/250000
training loss: 3.2816797286429 on iter: 239600/250000
training loss: 3.6524804594816 on iter: 239700/250000
training loss: 3.6359692395707 on iter: 239800/250000
training loss: 4.3029315089855 on iter: 239900/250000
training loss: 3.8505905040258 on iter: 240000/250000
training loss: 3.913101922253 on iter: 240100/250000
training loss: 3.4441911540155 on iter: 240200/250000
training loss: 3.9639558071553 on iter: 240300/250000
training loss: 3.5427701351568 on iter: 240400/250000
training loss: 3.2397479088726 on iter: 240500/250000
training loss: 3.0127816062856 on iter: 240600/250000
training loss: 2.5289957929476 on iter: 240700/250000
training loss: 3.4292539206763 on iter: 240800/250000
training loss: 4.4374044042335 on iter: 240900/250000
training loss: 2.7807989508533 on iter: 241000/250000
training loss: 3.4889871151552 on iter: 241100/250000
training loss: 3.2264789508683 on iter: 241200/250000
training loss: 3.0630989170483 on iter: 241300/250000
training loss: 3.6901301482085 on iter: 241400/250000
training loss: 3.0741457568179 on iter: 241500/250000
training loss: 3.0200766861603 on iter: 241600/250000
training loss: 3.8183715785058 on iter: 241700/250000
training loss: 3.3579664285078 on iter: 241800/250000
training loss: 3.75875088453 on iter: 241900/250000
training loss: 4.2320832592263 on iter: 242000/250000
training loss: 3.5946558698624 on iter: 242100/250000
training loss: 2.1520302476826 on iter: 242200/250000
training loss: 3.2340493319224 on iter: 242300/250000
training loss: 3.2682406584414 on iter: 242400/250000
training loss: 2.9202101669918 on iter: 242500/250000
training loss: 2.9014664838839 on iter: 242600/250000
training loss: 2.7860538567122 on iter: 242700/250000
training loss: 2.9239221764488 on iter: 242800/250000
training loss: 2.9841851012133 on iter: 242900/250000
training loss: 3.1794594715864 on iter: 243000/250000
training loss: 3.3090687841943 on iter: 243100/250000
training loss: 3.4707521715026 on iter: 243200/250000
training loss: 3.4965538866112 on iter: 243300/250000
training loss: 3.4880536256653 on iter: 243400/250000
training loss: 2.7626875595416 on iter: 243500/250000
training loss: 3.2016050781809 on iter: 243600/250000
training loss: 3.847247433851 on iter: 243700/250000
training loss: 4.8453397560799 on iter: 243800/250000
training loss: 2.9864613019735 on iter: 243900/250000
training loss: 3.4826318064936 on iter: 244000/250000
training loss: 2.8077353152455 on iter: 244100/250000
training loss: 2.8236055844252 on iter: 244200/250000
training loss: 3.410147984499 on iter: 244300/250000
training loss: 2.3915119336661 on iter: 244400/250000
training loss: 3.4814422508165 on iter: 244500/250000
training loss: 3.0197128440989 on iter: 244600/250000
training loss: 3.5821954886704 on iter: 244700/250000
training loss: 3.1203239137506 on iter: 244800/250000
training loss: 3.3705882787881 on iter: 244900/250000
training loss: 3.0012074157539 on iter: 245000/250000
training loss: 3.1514461517094 on iter: 245100/250000
training loss: 3.3106291655752 on iter: 245200/250000
training loss: 2.6659367467982 on iter: 245300/250000
training loss: 3.1771432166825 on iter: 245400/250000
training loss: 2.6471931753356 on iter: 245500/250000
training loss: 3.6896210392306 on iter: 245600/250000
training loss: 3.5159365275502 on iter: 245700/250000
training loss: 3.3200454935297 on iter: 245800/250000
training loss: 3.3206008027831 on iter: 245900/250000
training loss: 2.9130790677 on iter: 246000/250000
training loss: 4.5603665940302 on iter: 246100/250000
training loss: 3.4700720774599 on iter: 246200/250000
training loss: 4.8854830732629 on iter: 246300/250000
training loss: 3.2980180678669 on iter: 246400/250000
training loss: 3.5913865760301 on iter: 246500/250000
training loss: 3.0790624585774 on iter: 246600/250000
training loss: 3.476289795025 on iter: 246700/250000
training loss: 2.7150108514144 on iter: 246800/250000
training loss: 3.9087440249076 on iter: 246900/250000
training loss: 3.1690978684945 on iter: 247000/250000
training loss: 2.8988086403879 on iter: 247100/250000
training loss: 3.2467076672374 on iter: 247200/250000
training loss: 4.2971069881409 on iter: 247300/250000
training loss: 3.4615159558087 on iter: 247400/250000
training loss: 3.4777209936989 on iter: 247500/250000
training loss: 2.9076608074442 on iter: 247600/250000
training loss: 3.0046127153319 on iter: 247700/250000
training loss: 3.6478911857521 on iter: 247800/250000
training loss: 3.7600461822598 on iter: 247900/250000
training loss: 3.8081851454663 on iter: 248000/250000
training loss: 3.0288262209269 on iter: 248100/250000
training loss: 3.8045254712336 on iter: 248200/250000
training loss: 3.0014096101222 on iter: 248300/250000
training loss: 2.6648702391464 on iter: 248400/250000
training loss: 3.0973710238868 on iter: 248500/250000
training loss: 3.21178833147 on iter: 248600/250000
training loss: 3.3652397308744 on iter: 248700/250000
training loss: 3.6136089057546 on iter: 248800/250000
training loss: 3.0909798858796 on iter: 248900/250000
training loss: 2.5578616976569 on iter: 249000/250000
training loss: 3.5662768480493 on iter: 249100/250000
training loss: 3.3371438164301 on iter: 249200/250000
training loss: 3.0698375165063 on iter: 249300/250000
training loss: 3.2172591927209 on iter: 249400/250000
training loss: 3.5046515325146 on iter: 249500/250000
training loss: 3.8954496279085 on iter: 249600/250000
training loss: 3.1951476890746 on iter: 249700/250000
training loss: 3.1147774654283 on iter: 249800/250000
training loss: 3.7507720261221 on iter: 249900/250000
training loss: 3.1113258678018 on iter: 250000/250000
I believe I was not using the rnn package mentioned by you, because luarocks install rnn, probably installs another package.
training loss should be below 1.0, around 0.8, at the end of training. If you have a trouble with persistence, please reopen this issue.
Hi, I had downloaded the data_prepro.h5, data_prepro.json and seconds.json from the google drive link that you have shared. Also I had generated the data_res.h5 file by running prepro_res.lua. However on re-training the model by running train.lua(with the default parameters)and submitting the json file to the challenge server, I am getting an accuracy of only 50%. The json file generated from the pretrained model is achieving the desired accuracy of 65%.
Did you train the model with some different set of hyperparameters or am I making some mistake in training?