weecology / NEON_crown_maps

Generating tree crown maps for NEON sites
MIT License
1 stars 0 forks source link

why does GPU memory not gc? #20

Closed bw4sz closed 4 years ago

bw4sz commented 4 years ago

GPU memory increases over time,

distributed.worker - INFO - Start worker at: tcp://10.13.164.114:35754

distributed.worker - INFO - Listening to: tcp://10.13.164.114:35754

distributed.worker - INFO - dashboard at: 10.13.164.114:33900

distributed.worker - INFO - Waiting to connect to: tcp://10.13.164.113:36350

distributed.worker - INFO - -------------------------------------------------

distributed.worker - INFO - Threads: 1

distributed.worker - INFO - Memory: 15.00 GB

distributed.worker - INFO - Local Directory: /orange/ewhite/b.weinstein/NEON/logs/dask/worker-3lo8w12b

distributed.worker - INFO - -------------------------------------------------

distributed.worker - INFO - Registered to: tcp://10.13.164.113:36350

distributed.worker - INFO - -------------------------------------------------

distributed.worker - WARNING - gc.collect() took 1.721s. This is usually a sign that some tasks handle too many Python objects at the same time. Rechunking the work into smaller tasks might help.

distributed.worker - WARNING - Worker is at 90% memory usage. Pausing worker. Process memory: 13.51 GB -- Worker memory limit: 15.00 GB

distributed.worker - WARNING - Worker is at 87% memory usage. Resuming worker. Process memory: 13.11 GB -- Worker memory limit: 15.00 GB

distributed.worker - WARNING - Worker is at 90% memory usage. Pausing worker. Process memory: 13.53 GB -- Worker memory limit: 15.00 GB

distributed.worker - WARNING - Worker is at 88% memory usage. Resuming worker. Process memory: 13.35 GB -- Worker memory limit: 15.00 GB

distributed.worker - WARNING - Worker is at 90% memory usage. Pausing worker. Process memory: 13.62 GB -- Worker memory limit: 15.00 GB

distributed.worker - WARNING - Worker is at 86% memory usage. Resuming worker. Process memory: 13.04 GB -- Worker memory limit: 15.00 GB

distributed.worker - WARNING - Worker is at 90% memory usage. Pausing worker. Process memory: 13.61 GB -- Worker memory limit: 15.00 GB

distributed.worker - WARNING - Worker is at 88% memory usage. Resuming worker. Process memory: 13.25 GB -- Worker memory limit: 15.00 GB

distributed.worker - WARNING - Compute Failed Function: run_rgb args: ('/orange/ewhite/b.weinstein/NEON/crops/2018_TEAK_3_321000_4103000_image.tfrecord', '/orange/ewhite/NeonData/TEAK/DP3.30010.001/2018/FullSite/D17/2018_TEAK_3/L3/Camera/Mosaic') kwargs: {} Exception: ResourceExhaustedError()

distributed.worker - WARNING - Worker is at 90% memory usage. Pausing worker. Process memory: 13.53 GB -- Worker memory limit: 15.00 GB

distributed.worker - WARNING - Compute Failed Function: run_rgb args: ('/orange/ewhite/b.weinstein/NEON/crops/2018_TEAK_3_321000_4094000_image.tfrecord', '/orange/ewhite/NeonData/TEAK/DP3.30010.001/2018/FullSite/D17/2018_TEAK_3/L3/Camera/Mosaic') kwargs: {} Exception: ResourceExhaustedError()