weecology / NEON_crown_maps

Generating tree crown maps for NEON sites
MIT License
1 stars 0 forks source link

tfrecord not found after many hours #28

Closed bw4sz closed 4 years ago

bw4sz commented 4 years ago

Running for 9 hours without error, it eventually hit a snag.

Traceback (most recent call last):
  File "main.py", line 260, in <module>
    result = future.result()
  File "/home/b.weinstein/miniconda3/envs/crowns/lib/python3.7/site-packages/distributed/client.py", line 220, in result
    raise exc.with_traceback(tb)
  File "main.py", line 144, in run_rgb
    shps = predict.predict_tiles(model, records, patch_size=400, rgb_paths=rgb_paths, save_dir=save_dir, batch_size=model.config["batch_size"],overwrite=overwrite)
  File "/home/b.weinstein/NEON_crown_maps/predict.py", line 67, in predict_tiles
    boxes = predict_tile(model=model, tfrecord=tfrecord, patch_size=patch_size, batch_size=batch_size, score_threshold=score_threshold, max_detections=max_detections, classes=classes)
  File "/home/b.weinstein/NEON_crown_maps/predict.py", line 140, in predict_tile
    box_array, score_array, label_array = model.prediction_model.predict_on_batch(iterator)
  File "/apps/tensorflow/1.14.0.cuda10.gpu/lib/python3.7/site-packages/keras/engine/training.py", line 1580, in predict_on_batch
    outputs = self.predict_function(ins)
  File "/apps/tensorflow/1.14.0.cuda10.gpu/lib/python3.7/site-packages/tensorflow/python/keras/backend.py", line 3292, in __call__
    run_metadata=self.run_metadata)
  File "/apps/tensorflow/1.14.0.cuda10.gpu/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1458, in __call__
    run_metadata_ptr)
tensorflow.python.framework.errors_impl.NotFoundError: 2 root error(s) found.
  (0) Not found: /orange/idtrees-collab/crops/tfrecords/2018_SRER_2_523000_3517000_image_244.png; No such file or directory
     [[{{node ReadFile}}]]
     [[IteratorGetNext]]
     [[filtered_detections/map/while/TensorArrayWrite_2/TensorArrayWriteV3/_1899]]
  (1) Not found: /orange/idtrees-collab/crops/tfrecords/2018_SRER_2_523000_3517000_image_244.png; No such file or directory
     [[{{node ReadFile}}]]
     [[IteratorGetNext]]
0 successful operations.
0 derived errors ignored.
distributed.client - ERROR - Failed to reconnect to scheduler after 3.00 seconds, closing client
_GatheringFuture exception was never retrieved
future: <_GatheringFuture finished exception=CancelledError()>
concurrent.futures._base.CancelledError

why this tile?