Running for 9 hours without error, it eventually hit a snag.
Traceback (most recent call last):
File "main.py", line 260, in <module>
result = future.result()
File "/home/b.weinstein/miniconda3/envs/crowns/lib/python3.7/site-packages/distributed/client.py", line 220, in result
raise exc.with_traceback(tb)
File "main.py", line 144, in run_rgb
shps = predict.predict_tiles(model, records, patch_size=400, rgb_paths=rgb_paths, save_dir=save_dir, batch_size=model.config["batch_size"],overwrite=overwrite)
File "/home/b.weinstein/NEON_crown_maps/predict.py", line 67, in predict_tiles
boxes = predict_tile(model=model, tfrecord=tfrecord, patch_size=patch_size, batch_size=batch_size, score_threshold=score_threshold, max_detections=max_detections, classes=classes)
File "/home/b.weinstein/NEON_crown_maps/predict.py", line 140, in predict_tile
box_array, score_array, label_array = model.prediction_model.predict_on_batch(iterator)
File "/apps/tensorflow/1.14.0.cuda10.gpu/lib/python3.7/site-packages/keras/engine/training.py", line 1580, in predict_on_batch
outputs = self.predict_function(ins)
File "/apps/tensorflow/1.14.0.cuda10.gpu/lib/python3.7/site-packages/tensorflow/python/keras/backend.py", line 3292, in __call__
run_metadata=self.run_metadata)
File "/apps/tensorflow/1.14.0.cuda10.gpu/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1458, in __call__
run_metadata_ptr)
tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found.
(0) Not found: /orange/idtrees-collab/crops/tfrecords/2018_SRER_2_523000_3517000_image_244.png; No such file or directory
[[{{node ReadFile}}]]
[[IteratorGetNext]]
[[filtered_detections/map/while/TensorArrayWrite_2/TensorArrayWriteV3/_1899]]
(1) Not found: /orange/idtrees-collab/crops/tfrecords/2018_SRER_2_523000_3517000_image_244.png; No such file or directory
[[{{node ReadFile}}]]
[[IteratorGetNext]]
0 successful operations.
0 derived errors ignored.
distributed.client - ERROR - Failed to reconnect to scheduler after 3.00 seconds, closing client
_GatheringFuture exception was never retrieved
future: <_GatheringFuture finished exception=CancelledError()>
concurrent.futures._base.CancelledError
Running for 9 hours without error, it eventually hit a snag.
why this tile?