juglab / N2V_fiji

BSD 2-Clause "Simplified" License
21 stars 3 forks source link

Cannot train, SCIFIO error when saving testinput.tif file after second epoch #20

Closed lacan closed 4 years ago

lacan commented 4 years ago

Trying to Run N2V Train+Predict on an up to date Fiji yields the following error after the second epoch, along with a typo on the word "attempting"

Seems this is more of a SCIFIO issue following the big FIJI update, but considering that I experience it on N2V I thought I would post it here.

[INFO] Load TensorFlow..
[INFO] Using native TensorFlow version: TF 1.15.0 GPU (CUDA 10.0, CuDNN >= 7.4.1)
Using 10% of training data for validation
[INFO] Tile training and validation data..
[INFO] Generated 134 tiles of shape [128, 128]
[INFO] Create session..
[INFO] Import graph..
[INFO] Normalizing..
[INFO] mean: 122.31433
[INFO] stdDev: 49.468414
[INFO] Augment tiles..
[INFO] Prepare training batches...
65 blind-spots will be generated per training patch of size [64, 64].
[INFO] Prepare validation batches..
65 blind-spots will be generated per training patch of size [64, 64].
[INFO] Start training..
[INFO] Epoch 1/3 
1 / 10 [*---------] - loss: 0.761013 mse: 0.761013 abs: 0.542936 lr: 0.000400
2 / 10 [**--------] - loss: 0.573844 mse: 0.573844 abs: 0.438935 lr: 0.000400
3 / 10 [***-------] - loss: 0.380703 mse: 0.380703 abs: 0.380339 lr: 0.000400
4 / 10 [****------] - loss: 0.302244 mse: 0.302244 abs: 0.361511 lr: 0.000400
5 / 10 [*****-----] - loss: 0.299051 mse: 0.299051 abs: 0.357450 lr: 0.000400
6 / 10 [******----] - loss: 0.287354 mse: 0.287354 abs: 0.333070 lr: 0.000400
7 / 10 [*******---] - loss: 0.210027 mse: 0.210027 abs: 0.301312 lr: 0.000400
8 / 10 [********--] - loss: 0.202891 mse: 0.202891 abs: 0.291253 lr: 0.000400
9 / 10 [*********-] - loss: 0.226710 mse: 0.226710 abs: 0.284020 lr: 0.000400
10 / 10 [**********] - loss: 0.199167 mse: 0.199167 abs: 0.283714 lr: 0.000400
[INFO] 
Validation loss: 0.6927516 abs: 0.4159032 mse: 0.6927515
time of step: 00:00:20
[INFO] Epoch 2/3 remaining training time: 00:00:40
1 / 10 [*---------] - loss: 0.142750 mse: 0.142750 abs: 0.249074 lr: 0.000400
2 / 10 [**--------] - loss: 0.151557 mse: 0.151557 abs: 0.252350 lr: 0.000400
3 / 10 [***-------] - loss: 0.148761 mse: 0.148761 abs: 0.246103 lr: 0.000400
4 / 10 [****------] - loss: 0.173188 mse: 0.173188 abs: 0.241374 lr: 0.000400
[INFO] starting with index 0 of training batches
5 / 10 [*****-----] - loss: 0.170642 mse: 0.170642 abs: 0.245718 lr: 0.000400
6 / 10 [******----] - loss: 0.149576 mse: 0.149576 abs: 0.225690 lr: 0.000400
7 / 10 [*******---] - loss: 0.128991 mse: 0.128991 abs: 0.229864 lr: 0.000400
8 / 10 [********--] - loss: 0.112550 mse: 0.112550 abs: 0.220624 lr: 0.000400
9 / 10 [*********-] - loss: 0.096749 mse: 0.096749 abs: 0.211077 lr: 0.000400
10 / 10 [**********] - loss: 0.102934 mse: 0.102934 abs: 0.221298 lr: 0.000400
[INFO] 
Validation loss: 0.2860396 abs: 0.329513 mse: 0.2860396
java.util.concurrent.ExecutionException: io.scif.img.ImgIOException: io.scif.FormatException: SCIFIO exception when writing to file FileLocation:file:/C:/Users/oburri/AppData/Local/Temp/n2v-latest-6800095336051881157/testinput.tif:
Attemptint to write to output that already exists. Please rename the output, remove the existing conflict, or adjust configuration.
    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:192)
    at de.csbdresden.n2v.train.N2VTraining.train(N2VTraining.java:187)
    at de.csbdresden.n2v.command.N2VTrainPredictCommand.mainThread(N2VTrainPredictCommand.java:188)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: io.scif.img.ImgIOException: io.scif.FormatException: SCIFIO exception when writing to file FileLocation:file:/C:/Users/oburri/AppData/Local/Temp/n2v-latest-6800095336051881157/testinput.tif:
Attemptint to write to output that already exists. Please rename the output, remove the existing conflict, or adjust configuration.
    at io.scif.img.ImgSaver.writeImg(ImgSaver.java:575)
    at io.scif.img.ImgSaver.writeImg(ImgSaver.java:552)
    at io.scif.img.ImgSaver.writeImg(ImgSaver.java:528)
    at io.scif.img.ImgSaver.saveImg(ImgSaver.java:243)
    at io.scif.img.ImgSaver.saveImg(ImgSaver.java:226)
    at io.scif.img.ImgSaver.saveImg(ImgSaver.java:196)
    at io.scif.img.ImgSaver.saveImg(ImgSaver.java:131)
    at de.csbdresden.n2v.train.OutputHandler.saveCheckpoint(OutputHandler.java:189)
    at de.csbdresden.n2v.train.N2VTraining.mainThread(N2VTraining.java:332)
    ... 5 more
lacan commented 4 years ago

Seems linked to this update https://github.com/scifio/scifio/commit/a061f2f2311df98f729cfdff1643821569d32378

frauzufall commented 4 years ago

That's right, the scifio behaviour was changed there. I'll make N2V delete the file before trying to save it again. Thanks for reporting!

frauzufall commented 4 years ago

This is fixed with 7e10e3ab0416d3c1abab7da1210fb511a14f7757, also uploaded to the update site.