tensorflow / skflow

Simplified interface for TensorFlow (mimicking Scikit Learn) for Deep Learning
Apache License 2.0
3.18k stars 441 forks source link

skflow and tensorboard #172

Closed laouer closed 8 years ago

laouer commented 8 years ago

Hi, I'm having issue to get the loss summary showing in tensorboard using skflow

this is my code


classifier = skflow.TensorFlowEstimator( model_fn=conv_model, n_classes=2, batch_size=BATCH_SIZE, steps=100000, learning_rate=0.001, config=RunConfig(gpu_memory_fraction=0.9))

val_monitor = monitors.ValidationMonitor(X_val, y_val, n_classes=2, print_steps=100) classifier.fit(X_train, y_train, val_monitor, logdir='my_model_1/') classifier.save('my_model_1/')


everything runs well


I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/io/data_feeder.py:281: VisibleDeprecationWarning: converting an array with ndim > 0 to an index will result in an error in the future out.itemset((i, self.y[sample]), 1.0) I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties: name: GeForce GTX 980 major: 5 minor: 2 memoryClockRate (GHz) 1.253 pciBusID 0000:03:00.0 Total memory: 4.00GiB Free memory: 3.91GiB I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y I tensorflow/core/common_runtime/gpu/gpu_device.cc:755] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980, pci bus id: 0000:03:00.0) /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/io/data_feeder.py:370: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future out.itemset((i, y), 1.0) Step #99, avg. train loss: 2.22587, avg. val loss: 2.14521 Step #199, avg. train loss: 0.82641, avg. val loss: 0.89103 Step #299, avg. train loss: 0.78344, avg. val loss: 0.85636 Step #399, avg. train loss: 0.76420, avg. val loss: 0.85675 Step #499, avg. train loss: 0.75868, avg. val loss: 0.84104 Step #599, avg. train loss: 0.75467, avg. val loss: 0.84945 Step #699, avg. train loss: 0.73990, avg. val loss: 0.91238 Step #799, avg. train loss: 0.73400, avg. val loss: 0.92720 Step #899, avg. train loss: 0.72879, avg. val loss: 0.91054 Step #999, avg. train loss: 0.73448, avg. val loss: 0.89823 Step #1099, avg. train loss: 0.70125, avg. val loss: 0.91640 Step #1199, avg. train loss: 0.71879, avg. val loss: 0.90597 Step #1299, avg. train loss: 0.70713, avg. val loss: 0.90736 Step #1399, avg. train loss: 0.70023, avg. val loss: 0.91414 Step #1499, avg. train loss: 0.69566, avg. val loss: 0.91007 Step #1599, avg. train loss: 0.68030, avg. val loss: 0.92729 Step #1699, avg. train loss: 0.68919, avg. val loss: 0.91168 Step #1799, avg. train loss: 0.67088, avg. val loss: 0.91744 Step #1899, avg. train loss: 0.68732, avg. val loss: 0.88844 Step #1999, avg. train loss: 0.67585, avg. val loss: 0.88854


it generates the file .tfevents that have 4,8M size (attached)

when I connect to the machine using chrome as explorer I have data in graphs/histograms/ but nothing in events(No scalar data was found) did I miss something to have loss logged ?

Ps: I'm running on GPU machine using the last build tensorflow attached tfevents my_model_1.zip

laouer commented 8 years ago

Opened an issue in tensorflow issues https://github.com/tensorflow/tensorflow/issues/2063