loosolab / UROPA

Universal RObust Peak Annotator
https://uropa-manual.readthedocs.io/
MIT License
15 stars 6 forks source link

Memory error handling #5

Closed ckuenne closed 3 years ago

ckuenne commented 4 years ago

i have run into a problem with uropa considering memory issues. whenever uropa is out of memory, it throws an error and continues. but then it runs into an endless loop: some job (Progress: Annotated 145000 peaks (16 jobs running; 145 jobs finished)) repeats forever. This was not the last job btw. the input bed file had around 500k records.

is there a way to deal with this gracefully? i.e. if out of memory, just pause/reduce parallel jobs (=threads) and try again failed packets later? or was that supposed to happen? is it possible to tell the size of objects in memory in the output? for example: how large is the reference db (gtf) in memory? how large is each packet in memory?

output below: 2020-11-05 16:42:22 [INFO] - Progress: Annotated 92000 peaks (16 jobs running; 92 jobs finished) 2020-11-05 16:42:27 [INFO] - Progress: Annotated 93000 peaks (16 jobs running; 93 jobs finished) Exception in thread Thread-1: Traceback (most recent call last): File "/mnt/software/x86_64/packages/python/2.7.8-anaconda-2019.10-stretch/lib/python2.7/threading.py", line 801, in bootstrap_inner self.run() File "/mnt/software/x86_64/packages/python/2.7.8-anaconda-2019.10-stretch/lib/python2.7/threading.py", line 754, in run self.target(*self.args, **self.kwargs) File "/mnt/software/x86_64/packages/python/2.7.8-anaconda-2019.10-stretch/lib/python2.7/multiprocessing/pool.py", line 328, in _handle_workers pool._maintain_pool() File "/mnt/software/x86_64/packages/python/2.7.8-anaconda-2019.10-stretch/lib/python2.7/multiprocessing/pool.py", line 232, in _maintain_pool self._repopulate_pool() File "/mnt/software/x86_64/packages/python/2.7.8-anaconda-2019.10-stretch/lib/python2.7/multiprocessing/pool.py", line 225, in _repopulate_pool w.start() File "/mnt/software/x86_64/packages/python/2.7.8-anaconda-2019.10-stretch/lib/python2.7/multiprocessing/process.py", line 130, in start self._popen = Popen(self) File "/mnt/software/x86_64/packages/python/2.7.8-anaconda-2019.10-stretch/lib/python2.7/multiprocessing/forking.py", line 121, in init self.pid = os.fork() OSError: [Errno 12] Cannot allocate memory

2020-11-05 16:42:32 [INFO] - Progress: Annotated 93000 peaks (16 jobs running; 93 jobs finished) 2020-11-05 16:42:37 [INFO] - Progress: Annotated 93000 peaks (16 jobs running; 93 jobs finished) 2020-11-05 16:42:43 [INFO] - Progress: Annotated 94000 peaks (16 jobs running; 94 jobs finished) 2020-11-05 16:42:48 [INFO] - Progress: Annotated 94000 peaks (16 jobs running; 94 jobs finished) 2020-11-05 16:42:53 [INFO] - Progress: Annotated 94000 peaks (16 jobs running; 94 jobs finished) 2020-11-05 16:42:58 [INFO] - Progress: Annotated 95000 peaks (16 jobs running; 95 jobs finished) 2020-11-05 16:43:03 [INFO] - Progress: Annotated 95000 peaks (16 jobs running; 95 jobs finished) 2020-11-05 16:43:08 [INFO] - Progress: Annotated 97000 peaks (16 jobs running; 97 jobs finished) 2020-11-05 16:43:13 [INFO] - Progress: Annotated 97000 peaks (16 jobs running; 97 jobs finished) 2020-11-05 16:43:18 [INFO] - Progress: Annotated 98000 peaks (16 jobs running; 98 jobs finished) 2020-11-05 16:43:23 [INFO] - Progress: Annotated 98000 peaks (16 jobs running; 98 jobs finished) 2020-11-05 16:43:28 [INFO] - Progress: Annotated 98000 peaks (16 jobs running; 98 jobs finished) 2020-11-05 16:43:33 [INFO] - Progress: Annotated 99000 peaks (16 jobs running; 99 jobs finished) 2020-11-05 16:43:38 [INFO] - Progress: Annotated 99000 peaks (16 jobs running; 99 jobs finished) 2020-11-05 16:43:43 [INFO] - Progress: Annotated 99000 peaks (16 jobs running; 99 jobs finished) 2020-11-05 16:43:48 [INFO] - Progress: Annotated 99000 peaks (16 jobs running; 99 jobs finished) 2020-11-05 16:43:53 [INFO] - Progress: Annotated 99000 peaks (16 jobs running; 99 jobs finished) 2020-11-05 16:43:58 [INFO] - Progress: Annotated 99000 peaks (16 jobs running; 99 jobs finished) 2020-11-05 16:44:03 [INFO] - Progress: Annotated 99000 peaks (16 jobs running; 99 jobs finished) 2020-11-05 16:44:08 [INFO] - Progress: Annotated 99000 peaks (16 jobs running; 99 jobs finished) 2020-11-05 16:44:13 [INFO] - Progress: Annotated 101000 peaks (16 jobs running; 101 jobs finished) 2020-11-05 16:44:18 [INFO] - Progress: Annotated 102000 peaks (16 jobs running; 102 jobs finished) 2020-11-05 16:44:23 [INFO] - Progress: Annotated 102000 peaks (16 jobs running; 102 jobs finished) 2020-11-05 16:44:28 [INFO] - Progress: Annotated 102000 peaks (16 jobs running; 102 jobs finished) 2020-11-05 16:44:33 [INFO] - Progress: Annotated 102000 peaks (16 jobs running; 102 jobs finished) 2020-11-05 16:44:38 [INFO] - Progress: Annotated 102000 peaks (16 jobs running; 102 jobs finished) 2020-11-05 16:44:43 [INFO] - Progress: Annotated 103000 peaks (16 jobs running; 103 jobs finished) 2020-11-05 16:44:48 [INFO] - Progress: Annotated 103000 peaks (16 jobs running; 103 jobs finished) 2020-11-05 16:44:53 [INFO] - Progress: Annotated 103000 peaks (16 jobs running; 103 jobs finished) 2020-11-05 16:44:58 [INFO] - Progress: Annotated 104000 peaks (16 jobs running; 104 jobs finished) 2020-11-05 16:45:05 [INFO] - Progress: Annotated 104000 peaks (16 jobs running; 104 jobs finished) 2020-11-05 16:45:10 [INFO] - Progress: Annotated 104000 peaks (16 jobs running; 104 jobs finished) 2020-11-05 16:45:15 [INFO] - Progress: Annotated 104000 peaks (16 jobs running; 104 jobs finished) 2020-11-05 16:45:20 [INFO] - Progress: Annotated 105000 peaks (16 jobs running; 105 jobs finished) 2020-11-05 16:45:25 [INFO] - Progress: Annotated 106000 peaks (16 jobs running; 106 jobs finished) 2020-11-05 16:45:30 [INFO] - Progress: Annotated 106000 peaks (16 jobs running; 106 jobs finished) 2020-11-05 16:45:35 [INFO] - Progress: Annotated 106000 peaks (16 jobs running; 106 jobs finished) 2020-11-05 16:45:41 [INFO] - Progress: Annotated 106000 peaks (16 jobs running; 106 jobs finished) 2020-11-05 16:45:47 [INFO] - Progress: Annotated 106000 peaks (16 jobs running; 106 jobs finished) 2020-11-05 16:45:52 [INFO] - Progress: Annotated 106000 peaks (16 jobs running; 106 jobs finished) 2020-11-05 16:45:57 [INFO] - Progress: Annotated 106000 peaks (16 jobs running; 106 jobs finished) 2020-11-05 16:46:02 [INFO] - Progress: Annotated 106000 peaks (16 jobs running; 106 jobs finished) 2020-11-05 16:46:07 [INFO] - Progress: Annotated 106000 peaks (16 jobs running; 106 jobs finished) 2020-11-05 16:46:12 [INFO] - Progress: Annotated 107000 peaks (16 jobs running; 107 jobs finished) 2020-11-05 16:46:17 [INFO] - Progress: Annotated 108000 peaks (16 jobs running; 108 jobs finished) 2020-11-05 16:46:22 [INFO] - Progress: Annotated 108000 peaks (16 jobs running; 108 jobs finished) 2020-11-05 16:46:27 [INFO] - Progress: Annotated 109000 peaks (16 jobs running; 109 jobs finished) 2020-11-05 16:46:32 [INFO] - Progress: Annotated 109000 peaks (16 jobs running; 109 jobs finished) 2020-11-05 16:46:37 [INFO] - Progress: Annotated 109000 peaks (16 jobs running; 109 jobs finished) 2020-11-05 16:46:42 [INFO] - Progress: Annotated 111000 peaks (16 jobs running; 111 jobs finished) 2020-11-05 16:46:47 [INFO] - Progress: Annotated 111000 peaks (16 jobs running; 111 jobs finished) 2020-11-05 16:46:52 [INFO] - Progress: Annotated 111000 peaks (16 jobs running; 111 jobs finished) 2020-11-05 16:46:57 [INFO] - Progress: Annotated 111000 peaks (16 jobs running; 111 jobs finished) 2020-11-05 16:47:04 [INFO] - Progress: Annotated 111000 peaks (16 jobs running; 111 jobs finished) 2020-11-05 16:47:08 [INFO] - Progress: Annotated 111000 peaks (16 jobs running; 111 jobs finished) 2020-11-05 16:47:13 [INFO] - Progress: Annotated 111000 peaks (16 jobs running; 111 jobs finished) 2020-11-05 16:47:18 [INFO] - Progress: Annotated 111000 peaks (16 jobs running; 111 jobs finished) 2020-11-05 16:47:23 [INFO] - Progress: Annotated 111000 peaks (16 jobs running; 111 jobs finished) 2020-11-05 16:47:28 [INFO] - Progress: Annotated 112000 peaks (16 jobs running; 112 jobs finished) 2020-11-05 16:47:33 [INFO] - Progress: Annotated 113000 peaks (16 jobs running; 113 jobs finished) 2020-11-05 16:47:38 [INFO] - Progress: Annotated 113000 peaks (16 jobs running; 113 jobs finished) 2020-11-05 16:47:43 [INFO] - Progress: Annotated 113000 peaks (16 jobs running; 113 jobs finished) 2020-11-05 16:47:48 [INFO] - Progress: Annotated 114000 peaks (16 jobs running; 114 jobs finished) 2020-11-05 16:47:53 [INFO] - Progress: Annotated 114000 peaks (16 jobs running; 114 jobs finished) 2020-11-05 16:47:58 [INFO] - Progress: Annotated 115000 peaks (16 jobs running; 115 jobs finished) 2020-11-05 16:48:03 [INFO] - Progress: Annotated 115000 peaks (16 jobs running; 115 jobs finished) 2020-11-05 16:48:08 [INFO] - Progress: Annotated 116000 peaks (16 jobs running; 116 jobs finished) 2020-11-05 16:48:13 [INFO] - Progress: Annotated 117000 peaks (16 jobs running; 117 jobs finished) 2020-11-05 16:48:18 [INFO] - Progress: Annotated 117000 peaks (16 jobs running; 117 jobs finished) 2020-11-05 16:48:23 [INFO] - Progress: Annotated 117000 peaks (16 jobs running; 117 jobs finished) 2020-11-05 16:48:28 [INFO] - Progress: Annotated 119000 peaks (16 jobs running; 119 jobs finished) 2020-11-05 16:48:33 [INFO] - Progress: Annotated 119000 peaks (16 jobs running; 119 jobs finished) 2020-11-05 16:48:38 [INFO] - Progress: Annotated 120000 peaks (16 jobs running; 120 jobs finished) 2020-11-05 16:48:44 [INFO] - Progress: Annotated 120000 peaks (16 jobs running; 120 jobs finished) 2020-11-05 16:48:49 [INFO] - Progress: Annotated 120000 peaks (16 jobs running; 120 jobs finished) 2020-11-05 16:48:54 [INFO] - Progress: Annotated 120000 peaks (16 jobs running; 120 jobs finished) 2020-11-05 16:48:59 [INFO] - Progress: Annotated 120000 peaks (16 jobs running; 120 jobs finished) 2020-11-05 16:49:04 [INFO] - Progress: Annotated 120000 peaks (16 jobs running; 120 jobs finished) 2020-11-05 16:49:09 [INFO] - Progress: Annotated 120000 peaks (16 jobs running; 120 jobs finished) 2020-11-05 16:49:14 [INFO] - Progress: Annotated 120000 peaks (16 jobs running; 120 jobs finished) 2020-11-05 16:49:19 [INFO] - Progress: Annotated 120000 peaks (16 jobs running; 120 jobs finished) 2020-11-05 16:49:24 [INFO] - Progress: Annotated 120000 peaks (16 jobs running; 120 jobs finished) 2020-11-05 16:49:29 [INFO] - Progress: Annotated 120000 peaks (16 jobs running; 120 jobs finished) 2020-11-05 16:49:34 [INFO] - Progress: Annotated 120000 peaks (16 jobs running; 120 jobs finished) 2020-11-05 16:49:39 [INFO] - Progress: Annotated 120000 peaks (16 jobs running; 120 jobs finished) 2020-11-05 16:49:44 [INFO] - Progress: Annotated 120000 peaks (16 jobs running; 120 jobs finished) 2020-11-05 16:49:49 [INFO] - Progress: Annotated 120000 peaks (16 jobs running; 120 jobs finished) 2020-11-05 16:49:54 [INFO] - Progress: Annotated 121000 peaks (16 jobs running; 121 jobs finished) 2020-11-05 16:49:59 [INFO] - Progress: Annotated 121000 peaks (16 jobs running; 121 jobs finished) 2020-11-05 16:50:04 [INFO] - Progress: Annotated 121000 peaks (16 jobs running; 121 jobs finished) 2020-11-05 16:50:09 [INFO] - Progress: Annotated 122000 peaks (16 jobs running; 122 jobs finished) 2020-11-05 16:50:14 [INFO] - Progress: Annotated 123000 peaks (16 jobs running; 123 jobs finished) 2020-11-05 16:50:19 [INFO] - Progress: Annotated 123000 peaks (16 jobs running; 123 jobs finished) 2020-11-05 16:50:24 [INFO] - Progress: Annotated 125000 peaks (16 jobs running; 125 jobs finished) 2020-11-05 16:50:29 [INFO] - Progress: Annotated 125000 peaks (16 jobs running; 125 jobs finished) 2020-11-05 16:50:34 [INFO] - Progress: Annotated 125000 peaks (16 jobs running; 125 jobs finished) 2020-11-05 16:50:39 [INFO] - Progress: Annotated 125000 peaks (16 jobs running; 125 jobs finished) 2020-11-05 16:50:44 [INFO] - Progress: Annotated 125000 peaks (16 jobs running; 125 jobs finished) 2020-11-05 16:50:49 [INFO] - Progress: Annotated 126000 peaks (16 jobs running; 126 jobs finished) 2020-11-05 16:50:54 [INFO] - Progress: Annotated 127000 peaks (16 jobs running; 127 jobs finished) 2020-11-05 16:50:59 [INFO] - Progress: Annotated 128000 peaks (16 jobs running; 128 jobs finished) 2020-11-05 16:51:04 [INFO] - Progress: Annotated 129000 peaks (16 jobs running; 129 jobs finished) 2020-11-05 16:51:09 [INFO] - Progress: Annotated 129000 peaks (16 jobs running; 129 jobs finished) 2020-11-05 16:51:14 [INFO] - Progress: Annotated 129000 peaks (16 jobs running; 129 jobs finished) 2020-11-05 16:51:19 [INFO] - Progress: Annotated 129000 peaks (16 jobs running; 129 jobs finished) 2020-11-05 16:51:24 [INFO] - Progress: Annotated 129000 peaks (16 jobs running; 129 jobs finished) 2020-11-05 16:51:29 [INFO] - Progress: Annotated 129000 peaks (16 jobs running; 129 jobs finished) 2020-11-05 16:51:34 [INFO] - Progress: Annotated 129000 peaks (16 jobs running; 129 jobs finished) 2020-11-05 16:51:39 [INFO] - Progress: Annotated 129000 peaks (16 jobs running; 129 jobs finished) 2020-11-05 16:51:44 [INFO] - Progress: Annotated 129000 peaks (16 jobs running; 129 jobs finished) 2020-11-05 16:51:49 [INFO] - Progress: Annotated 129000 peaks (16 jobs running; 129 jobs finished) 2020-11-05 16:51:55 [INFO] - Progress: Annotated 129000 peaks (16 jobs running; 129 jobs finished) 2020-11-05 16:52:00 [INFO] - Progress: Annotated 129000 peaks (16 jobs running; 129 jobs finished) 2020-11-05 16:52:05 [INFO] - Progress: Annotated 130000 peaks (16 jobs running; 130 jobs finished) 2020-11-05 16:52:10 [INFO] - Progress: Annotated 130000 peaks (16 jobs running; 130 jobs finished) 2020-11-05 16:52:15 [INFO] - Progress: Annotated 130000 peaks (16 jobs running; 130 jobs finished) 2020-11-05 16:52:20 [INFO] - Progress: Annotated 130000 peaks (16 jobs running; 130 jobs finished) 2020-11-05 16:52:25 [INFO] - Progress: Annotated 130000 peaks (16 jobs running; 130 jobs finished) 2020-11-05 16:52:30 [INFO] - Progress: Annotated 131000 peaks (16 jobs running; 131 jobs finished) 2020-11-05 16:52:35 [INFO] - Progress: Annotated 131000 peaks (16 jobs running; 131 jobs finished) 2020-11-05 16:52:40 [INFO] - Progress: Annotated 132000 peaks (16 jobs running; 132 jobs finished) 2020-11-05 16:52:45 [INFO] - Progress: Annotated 132000 peaks (16 jobs running; 132 jobs finished) 2020-11-05 16:52:50 [INFO] - Progress: Annotated 132000 peaks (16 jobs running; 132 jobs finished) 2020-11-05 16:52:55 [INFO] - Progress: Annotated 132000 peaks (16 jobs running; 132 jobs finished) 2020-11-05 16:53:00 [INFO] - Progress: Annotated 132000 peaks (16 jobs running; 132 jobs finished) 2020-11-05 16:53:05 [INFO] - Progress: Annotated 132000 peaks (16 jobs running; 132 jobs finished) 2020-11-05 16:53:10 [INFO] - Progress: Annotated 132000 peaks (16 jobs running; 132 jobs finished) 2020-11-05 16:53:15 [INFO] - Progress: Annotated 132000 peaks (16 jobs running; 132 jobs finished) 2020-11-05 16:53:20 [INFO] - Progress: Annotated 133000 peaks (16 jobs running; 133 jobs finished) 2020-11-05 16:53:25 [INFO] - Progress: Annotated 134000 peaks (16 jobs running; 134 jobs finished) 2020-11-05 16:53:30 [INFO] - Progress: Annotated 134000 peaks (16 jobs running; 134 jobs finished) 2020-11-05 16:53:35 [INFO] - Progress: Annotated 134000 peaks (16 jobs running; 134 jobs finished) 2020-11-05 16:53:40 [INFO] - Progress: Annotated 135000 peaks (16 jobs running; 135 jobs finished) 2020-11-05 16:53:45 [INFO] - Progress: Annotated 135000 peaks (16 jobs running; 135 jobs finished) 2020-11-05 16:53:50 [INFO] - Progress: Annotated 135000 peaks (16 jobs running; 135 jobs finished) 2020-11-05 16:53:55 [INFO] - Progress: Annotated 135000 peaks (16 jobs running; 135 jobs finished) 2020-11-05 16:54:01 [INFO] - Progress: Annotated 135000 peaks (16 jobs running; 135 jobs finished) 2020-11-05 16:54:08 [INFO] - Progress: Annotated 135000 peaks (16 jobs running; 135 jobs finished) 2020-11-05 16:54:13 [INFO] - Progress: Annotated 135000 peaks (16 jobs running; 135 jobs finished) 2020-11-05 16:54:18 [INFO] - Progress: Annotated 135000 peaks (16 jobs running; 135 jobs finished) 2020-11-05 16:54:23 [INFO] - Progress: Annotated 135000 peaks (16 jobs running; 135 jobs finished) 2020-11-05 16:54:28 [INFO] - Progress: Annotated 135000 peaks (16 jobs running; 135 jobs finished) 2020-11-05 16:54:33 [INFO] - Progress: Annotated 135000 peaks (16 jobs running; 135 jobs finished) 2020-11-05 16:54:38 [INFO] - Progress: Annotated 136000 peaks (16 jobs running; 136 jobs finished) 2020-11-05 16:54:43 [INFO] - Progress: Annotated 136000 peaks (16 jobs running; 136 jobs finished) 2020-11-05 16:54:48 [INFO] - Progress: Annotated 137000 peaks (16 jobs running; 137 jobs finished) 2020-11-05 16:54:53 [INFO] - Progress: Annotated 137000 peaks (16 jobs running; 137 jobs finished) 2020-11-05 16:54:58 [INFO] - Progress: Annotated 138000 peaks (16 jobs running; 138 jobs finished) 2020-11-05 16:55:03 [INFO] - Progress: Annotated 138000 peaks (16 jobs running; 138 jobs finished) 2020-11-05 16:55:08 [INFO] - Progress: Annotated 139000 peaks (16 jobs running; 139 jobs finished) 2020-11-05 16:55:14 [INFO] - Progress: Annotated 139000 peaks (16 jobs running; 139 jobs finished) 2020-11-05 16:55:19 [INFO] - Progress: Annotated 139000 peaks (16 jobs running; 139 jobs finished) 2020-11-05 16:55:24 [INFO] - Progress: Annotated 139000 peaks (16 jobs running; 139 jobs finished) 2020-11-05 16:55:29 [INFO] - Progress: Annotated 139000 peaks (16 jobs running; 139 jobs finished) 2020-11-05 16:55:34 [INFO] - Progress: Annotated 141000 peaks (16 jobs running; 141 jobs finished) 2020-11-05 16:55:39 [INFO] - Progress: Annotated 141000 peaks (16 jobs running; 141 jobs finished) 2020-11-05 16:55:44 [INFO] - Progress: Annotated 141000 peaks (16 jobs running; 141 jobs finished) 2020-11-05 16:55:49 [INFO] - Progress: Annotated 141000 peaks (16 jobs running; 141 jobs finished) 2020-11-05 16:55:54 [INFO] - Progress: Annotated 141000 peaks (16 jobs running; 141 jobs finished) 2020-11-05 16:55:59 [INFO] - Progress: Annotated 141000 peaks (16 jobs running; 141 jobs finished) 2020-11-05 16:56:04 [INFO] - Progress: Annotated 142000 peaks (16 jobs running; 142 jobs finished) 2020-11-05 16:56:09 [INFO] - Progress: Annotated 142000 peaks (16 jobs running; 142 jobs finished) 2020-11-05 16:56:14 [INFO] - Progress: Annotated 142000 peaks (16 jobs running; 142 jobs finished) 2020-11-05 16:56:19 [INFO] - Progress: Annotated 142000 peaks (16 jobs running; 142 jobs finished) 2020-11-05 16:56:24 [INFO] - Progress: Annotated 142000 peaks (16 jobs running; 142 jobs finished) 2020-11-05 16:56:29 [INFO] - Progress: Annotated 144000 peaks (16 jobs running; 144 jobs finished) 2020-11-05 16:56:34 [INFO] - Progress: Annotated 144000 peaks (16 jobs running; 144 jobs finished) 2020-11-05 16:56:39 [INFO] - Progress: Annotated 144000 peaks (16 jobs running; 144 jobs finished) 2020-11-05 16:56:44 [INFO] - Progress: Annotated 144000 peaks (16 jobs running; 144 jobs finished) 2020-11-05 16:56:49 [INFO] - Progress: Annotated 144000 peaks (16 jobs running; 144 jobs finished) 2020-11-05 16:56:54 [INFO] - Progress: Annotated 144000 peaks (16 jobs running; 144 jobs finished) 2020-11-05 16:56:59 [INFO] - Progress: Annotated 144000 peaks (16 jobs running; 144 jobs finished) 2020-11-05 16:57:04 [INFO] - Progress: Annotated 144000 peaks (16 jobs running; 144 jobs finished) 2020-11-05 16:57:09 [INFO] - Progress: Annotated 144000 peaks (16 jobs running; 144 jobs finished) 2020-11-05 16:57:14 [INFO] - Progress: Annotated 144000 peaks (16 jobs running; 144 jobs finished) 2020-11-05 16:57:19 [INFO] - Progress: Annotated 144000 peaks (16 jobs running; 144 jobs finished) 2020-11-05 16:57:24 [INFO] - Progress: Annotated 144000 peaks (16 jobs running; 144 jobs finished) 2020-11-05 16:57:29 [INFO] - Progress: Annotated 144000 peaks (16 jobs running; 144 jobs finished) 2020-11-05 16:57:34 [INFO] - Progress: Annotated 144000 peaks (16 jobs running; 144 jobs finished) 2020-11-05 16:57:39 [INFO] - Progress: Annotated 144000 peaks (16 jobs running; 144 jobs finished) 2020-11-05 16:57:44 [INFO] - Progress: Annotated 145000 peaks (16 jobs running; 145 jobs finished) 2020-11-05 16:57:49 [INFO] - Progress: Annotated 145000 peaks (16 jobs running; 145 jobs finished) 2020-11-05 16:57:54 [INFO] - Progress: Annotated 145000 peaks (16 jobs running; 145 jobs finished) ...

best, carsten

msbentsen commented 4 years ago

The jobs run individually, and there is sadly no way of knowing in advance how much memory will be taken up by a process (because that depends on the entries fetched from the gtf). We could check the memory within each job, but the jobs would then need to communicate with each other to make sure, that the total memory is not exceeded - and I think that is overkill for the application.

As you say, a solution might be to catch which job failed, and then try to restart it later - I will have a look at that.

ckuenne commented 4 years ago

and maybe try to dynamically reduce the number of used threads if a job fails because of memory. if you start with 16 and a job fails with memory error, reduce it to 15. if another job fails afterwards reduce to 14, etc. jobs don't have to communicate with each other for that, just with the main thread.

and of course the endless loop of death should not happen :)

msbentsen commented 4 years ago

There is an option '--chunk' that you can set lower (default is 1000) to control the number of input .bed-sites per job - that might solve your immediate problem.

But yes, for solving the loop of death: Any failed jobs should be rerun either in smaller chunks or with less cores as you suggest. And probably also given a maximum number of retries... otherwise we just end up in the retry-loop of death.

ckuenne commented 3 years ago

i now had another case of endless loop, but this time without being connected to any error.

2020-11-12 15:55:05 [INFO] - Started annotation 2020-11-12 15:55:15 [INFO] - Progress: Annotated 0 peaks (16 jobs running; 0 jobs finished) 2020-11-12 15:55:20 [INFO] - Progress: Annotated 0 peaks (16 jobs running; 0 jobs finished) 2020-11-12 15:55:25 [INFO] - Progress: Annotated 0 peaks (16 jobs running; 0 jobs finished) ... 2020-11-13 01:13:59 [INFO] - Progress: Annotated 3624000 peaks (16 jobs running; 3624 jobs finished)

at this point it just stalled. the whole thing had 3.6m peaks, so he was actually finished. but 18 uropa threads are still running at 0% cpu/mem for two days now. and there has been no change of any file since then. so the loop is not limited to errors, it seems to be connected to long runtimes.

msbentsen commented 3 years ago

Hi Carsten,

I can reproduce the endless loop for your test run, but I still believe this has to do with the memory, even if the tasks don't fail with that error. I found that when memory usage reaches 100%, the output of the jobs run at that time are put on a lock, and sometimes do not recover - therefore no error (and very hard to catch).

The solution is to reduce --chunk to something small like 20/50/100 for large queries like yours. That will control the amount of data collected by each job, thereby limiting memory consumption. There is a slight overhead because of the increase in number of jobs, but it should be negligible for large runs like yours.

I will also be releasing a new UROPA version (4.0.0), which has dynamic control of the number of jobs run at one time, as well as better error handling. However, due to new additions of modules only available for python>=3.2 (ref: queuehandler), this version will no longer support python 2.x!

Best Mette

ckuenne commented 3 years ago

ok, thanks.

msbentsen commented 3 years ago

I just released uropa 4.0.0 on pypi.

@jenzopr can you change the requirements for the conda package? The update of the existing recipe is going to fail because the minimum required python version was changed to 3.2, and "psutil" was added to install_requires.

jenzopr commented 3 years ago

Now available via bioconda as well https://github.com/bioconda/bioconda-recipes/pull/25398

msbentsen commented 3 years ago

Great, I will close this issue :-)