Closed gaow closed 6 years ago
https://github.com/vatlab/SoS/blob/master/src/sos/step_executor.py#L1520
Concurrent execution is disabled for dynamic output because currently concurrent executor does not return _output
to the step so there is no way for the step to create step_output
from substeps. There might be deeper reason for this limitation but right now I think this is something that can be fixed.
A clearer example:
[1: shared={'step1': 'step_output'}]
input: for_each={'i': range(10)}, concurrent=True
output: dynamic(f'rep_{i}/*.res')
import random
path(f'rep_{i}/{random.randint(0, 10000)}.res').touch()
[2]
print(step1)
dynamic
output is replicate dependent so it has to be collected for each substep.step_output
needs to be aggregated from these dynamic output (and shared)Neither substep signature or concurrency works now. concurrency=True
is ignored intentionally because sos currently does not return _output
from concurrent process so it has to be obtained before the substeps are farmed to the queue.
Not done yet, some tests still fail, and there are uncertainties on the handling of shared variables, signatures with dynamic output, parallelization etc.
Supposed to be fixed. Close for now.
Not sure about this, but rerunning one of my old workflow I see something scary like this:
it is an R analysis involving dynamic output, but I see sos
all over the place. I can hardly see any R
instance showing up. Looks like most resource spent here are just overhead. I do not recall seeing something like this before. This is local non-task jobs.
With concurrent=True
? -v4
should tell you the number of slots allocated.
This is with concurrent True but I used j 24
so this is expected to use these slots. My point is that all my tasks should be heavy duty R command (heavy duty is relative ... maybe 1min per R analysis) so I'm expecting to see top
shows mostly R processes instead of sos
, an indication of even more heavy duty overhead going on.
If it is related to sqlite overhead I can wait for the zmq branch to work so we do not spend energy investigating this problem.
Okay, I've updated to the latest master and reran a job from yesterday. The workflow looks like this:
[SuSiE (SuSiE analysis)]
depends: R_library('susieR')
# SuSiE parameter: L
parameter: maxL = 5
# SuSiE parameter: prior variance; set to 0 to use per-dataset PVE estimates
parameter: prior_var = 0.0
input: glob.glob(f'{cwd}/{y_data:bnn}_{int(max_dist/1000)}Kb/chr*/*.rds'), group_by = 1, concurrent = True
output: dynamic(glob.glob(f'{cwd}/{y_data:bnn}_{int(max_dist/1000)}Kb/SuSiE_CS_*/*.rds'))
R: expand = "${ }"
The input is about 50K files, the output is the same number of files but into different folders depending on analysis results. What happened was:
sos
process, waited for at least 1.5hrs (the last time I checked top
), seems to be preparing for jobsSince it is a renewal of job from yesterday, I do have a number of ... is ignored due to saved signature
. But it only ignores 75 jobs. The actual output files I have are 16.8K (still far from 50K though after 12hrs running yesterday but according to notes in my notebook it used to complete 50K jobs in 3hrs).
I guess it is related to sqlite, but not sure how zmq can help with it -- maybe loading database to memory will also help, though it is likely not salable?
I think it is still the case that with dynamic
output you would have larger and larger _output
that leads to larger and larger signatures, right?
Oh because we have discussed it in #1024 I thought checking glob at each substep is already disallowed for dynamic output and the default is to mimic -s ignore
behavior for substep then check and write signatures at the end? If not, I'd not be surprised for the slowness. ...
Using something like
[SuSiE (SuSiE analysis)]
input: dynamic(f'input/*.rds'), group_by = 1, concurrent = True
output: dynamic(f'output/*.rds')
import random
output_file = f'output/{random.randint(0, 1000000)}.rds'
R: expand = True
Sys.sleep(20)
write({_index}, '{output_file}')
I see
1985961928 97544 6976 0 3:45PM ttys000 0:01.78 /Users/bpeng1/anaconda3/envs/sos/bin/python /Users/bpeng1/anaconda3/envs/sos/bin/sos run test -j8
1985961928 97547 97544 0 3:45PM ttys000 0:00.22 /Users/bpeng1/anaconda3/envs/sos/bin/python /Users/bpeng1/anaconda3/envs/sos/bin/sos run test -j8
1985961928 97548 97547 0 3:45PM ttys000 0:00.03 /Users/bpeng1/anaconda3/envs/sos/bin/python /Users/bpeng1/anaconda3/envs/sos/bin/sos run test -j8
1985961928 97549 97547 0 3:45PM ttys000 0:00.03 /Users/bpeng1/anaconda3/envs/sos/bin/python /Users/bpeng1/anaconda3/envs/sos/bin/sos run test -j8
1985961928 97550 97547 0 3:45PM ttys000 0:00.03 /Users/bpeng1/anaconda3/envs/sos/bin/python /Users/bpeng1/anaconda3/envs/sos/bin/sos run test -j8
1985961928 97551 97547 0 3:45PM ttys000 0:00.03 /Users/bpeng1/anaconda3/envs/sos/bin/python /Users/bpeng1/anaconda3/envs/sos/bin/sos run test -j8
1985961928 97552 97547 0 3:45PM ttys000 0:00.03 /Users/bpeng1/anaconda3/envs/sos/bin/python /Users/bpeng1/anaconda3/envs/sos/bin/sos run test -j8
1985961928 97553 97547 0 3:45PM ttys000 0:00.03 /Users/bpeng1/anaconda3/envs/sos/bin/python /Users/bpeng1/anaconda3/envs/sos/bin/sos run test -j8
1985961928 97554 97547 0 3:45PM ttys000 0:00.03 /Users/bpeng1/anaconda3/envs/sos/bin/python /Users/bpeng1/anaconda3/envs/sos/bin/sos run test -j8
1985961928 97555 97547 0 3:45PM ttys000 0:00.03 /Users/bpeng1/anaconda3/envs/sos/bin/python /Users/bpeng1/anaconda3/envs/sos/bin/sos run test -j8
1985961928 97556 97553 0 3:45PM ttys000 0:00.36 /Users/bpeng1/anaconda3/envs/sos/lib/R/bin/exec/R --slave --no-restore --file=/var/folders/ys/gnzk0qbx5wbdgm531v82xxljv5yqy8/T/tmpfov60v0f.R
1985961928 97557 97548 0 3:45PM ttys000 0:00.37 /Users/bpeng1/anaconda3/envs/sos/lib/R/bin/exec/R --slave --no-restore --file=/var/folders/ys/gnzk0qbx5wbdgm531v82xxljv5yqy8/T/tmp649w2a56.R
1985961928 97558 97555 0 3:45PM ttys000 0:00.36 /Users/bpeng1/anaconda3/envs/sos/lib/R/bin/exec/R --slave --no-restore --file=/var/folders/ys/gnzk0qbx5wbdgm531v82xxljv5yqy8/T/tmp59jhrkep.R
1985961928 97559 97550 0 3:45PM ttys000 0:00.36 /Users/bpeng1/anaconda3/envs/sos/lib/R/bin/exec/R --slave --no-restore --file=/var/folders/ys/gnzk0qbx5wbdgm531v82xxljv5yqy8/T/tmpmcpu0bxx.R
1985961928 97560 97549 0 3:45PM ttys000 0:00.36 /Users/bpeng1/anaconda3/envs/sos/lib/R/bin/exec/R --slave --no-restore --file=/var/folders/ys/gnzk0qbx5wbdgm531v82xxljv5yqy8/T/tmphibs2swo.R
1985961928 97561 97551 0 3:45PM ttys000 0:00.37 /Users/bpeng1/anaconda3/envs/sos/lib/R/bin/exec/R --slave --no-restore --file=/var/folders/ys/gnzk0qbx5wbdgm531v82xxljv5yqy8/T/tmp1h3j32qb.R
1985961928 97562 97554 0 3:45PM ttys000 0:00.36 /Users/bpeng1/anaconda3/envs/sos/lib/R/bin/exec/R --slave --no-restore --file=/var/folders/ys/gnzk0qbx5wbdgm531v82xxljv5yqy8/T/tmpv3kaq9ak.R
1985961928 97563 97552 0 3:45PM ttys000 0:00.36 /Users/bpeng1/anaconda3/envs/sos/lib/R/bin/exec/R --slave --no-restore --file=/var/folders/ys/gnzk0qbx5wbdgm531v82xxljv5yqy8/T/tmp0q2f59fn.R
which is more or less correct with -j8
sos still allows it and handles it more "correctly" now, because glob
can be substep specific, as the example here.
Because the new step_output
is the collection of _output
without duplicates, I can almost optimize this case by removing outputs that has been seen before. In the concurrent=True
case each process will get non-consecutive substeps but still the majority of outputs would be removed from individual _output
if we do this for each worker.
well, I have 50K files to start from and I'm on a network system. I think figuring out *
for 50K files took a while here. Not sure how many on your end for input files, but if as you said each substep is still checking all available dynamic output files I'm not surprised it is slow. Now trying it again with -s ignore
to see if that works better.
I can almost optimize this case by removing outputs that has been seen before.
Well, it may help saving signature checking burden, but not the glob.glob
procedure to look for fit files for every substep. I think that actually hurts. I wonder if such checks makes sense in the context concurent
though.
I agree, glob.glob
a directory of 50k files repeatedly will not be a pleasant experience.
So the best way is still to derive exact _output
from _input
, even if you can only limit the search with information from _input
.
Or you could ignore the output
specification altogether? If the next step needs it, you can glob.glob
there as input.
So the best way is still to derive exact _output from _input
I agree. I'll think about my procedure. I did not derive because I want to save output to different folders based on their strength of association signal (only to be known after analysis) -- that case i do not have to post-process the results. So it is basically unlimited check on up to 50K files repeatedly.
I'm currently adding back -s ignore
for this special case where all the other recommendations you suggest will not work. It seems to have improved it:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22039 gaow 20 0 589148 219444 13872 R 47.2 0.3 0:21.54 sos
22035 gaow 20 0 585052 215508 13872 S 46.8 0.3 0:21.20 sos
22034 gaow 20 0 584796 215252 13872 S 46.5 0.3 0:21.14 sos
22028 gaow 20 0 583772 214228 13872 S 44.2 0.3 0:21.75 sos
22029 gaow 20 0 588892 219088 13872 R 43.2 0.3 0:20.62 sos
22036 gaow 20 0 583772 214228 13872 S 42.9 0.3 0:21.85 sos
22030 gaow 20 0 584540 214996 13872 S 41.5 0.3 0:21.07 sos
22038 gaow 20 0 587100 217248 13872 R 40.2 0.3 0:21.09 sos
22032 gaow 20 0 583772 214228 13872 S 39.5 0.3 0:20.06 sos
22033 gaow 20 0 584028 214484 13872 S 39.5 0.3 0:21.58 sos
22040 gaow 20 0 584284 214484 13872 R 39.5 0.3 0:21.18 sos
22037 gaow 20 0 583772 214228 13872 S 39.2 0.3 0:21.62 sos
22026 gaow 20 0 585308 215764 13872 R 38.5 0.3 0:21.07 sos
22027 gaow 20 0 585308 215764 13872 R 37.9 0.3 0:21.28 sos
22025 gaow 20 0 584028 214500 13884 R 35.5 0.3 0:20.88 sos
22031 gaow 20 0 584028 214484 13872 S 34.9 0.3 0:20.72 sos
5851 root 0 -20 18.630g 4.483g 201424 S 27.2 7.2 3185:21 mmfsd
29813 gaow 20 0 246232 98524 6120 D 11.0 0.2 0:00.33 R
29839 gaow 20 0 0 0 0 R 11.0 0.0 0:00.33 R
22024 gaow 20 0 1006920 491116 14004 S 10.6 0.8 0:40.94 sos
29860 gaow 20 0 236548 89056 6136 R 10.0 0.1 0:00.30 R
29866 gaow 20 0 172804 57676 5776 R 8.6 0.1 0:00.26 R
29874 gaow 20 0 160460 49636 5712 R 7.3 0.1 0:00.22 R
29886 gaow 20 0 137016 39172 5388 R 3.7 0.1 0:00.11 R
29899 gaow 20 0 109912 13944 5280 R 1.3 0.0 0:00.04 R
29900 gaow 20 0 108808 12364 4756 R 1.3 0.0 0:00.04 R
At least now you see some R processes going on, though along with lots of sos
still. But now you see the sos
process uses a lot more CPUs, and R still uses very limited CPU resource compared to the overhead.
Or you could ignore the output specification altogether? If the next step needs it, you can glob.glob there as input.
I think this is what -s ignore
essentially do?
No. sos can still keeps signatures and skip steps, just with empty _output
.
Okay I see! Thanks for the clarification. Let me rerun my analysis that way and show the top
output.
BTW should we be worried that verifying 50K input in
input: f'{cwd}/{y_data:bnn}_{int(max_dist/1000)}Kb/chr*/*.rds', group_by = 1, concurrent = True
takes very long time (maybe on the network file system)?
yeah, glob
itself takes a bit of time, and sos tries to read the content and hash it, that will take a much longer while, so -s ignore
might still be the way to go..
yeah, glob itself takes a bit of time, and sos tries to read the content and hash it, that will take a much longer while
in a separate test, glob
itself in this case takes half a second, I get:
>>> len(files)
44068
The that one sos
process now takes 15 min after I remove output
. But it is very low cpu usage:
7488 gaow 20 0 406756 112132 15056 D 1.0 0.2 0:21.14 sos
Using only 1.3% CPU. It looks like a sqlite bottleneck.
Then after 15min, I still get many R and sos processes going on at the same time, but R certainly now dominate it:
36609 gaow 20 0 261248 113792 6132 D 18.9 0.2 0:00.57 R
36635 gaow 20 0 0 0 0 R 17.5 0.0 0:00.53 R
36663 gaow 20 0 0 0 0 R 15.2 0.0 0:00.46 R
36705 gaow 20 0 260648 113196 6132 D 14.2 0.2 0:00.43 R
7901 gaow 20 0 998888 485920 14004 S 12.3 0.7 0:48.67 sos
36724 gaow 20 0 231160 83704 6136 D 11.6 0.1 0:00.35 R
36736 gaow 20 0 237852 90000 6044 R 11.3 0.1 0:00.34 R
36738 gaow 20 0 219628 71812 6044 R 9.9 0.1 0:00.30 R
36756 gaow 20 0 160608 49772 5712 R 8.3 0.1 0:00.25 R
36764 gaow 20 0 157932 47148 5712 R 7.6 0.1 0:00.23 R
36769 gaow 20 0 152764 41884 5688 R 6.6 0.1 0:00.20 R
36781 gaow 20 0 137620 37820 5416 R 5.3 0.1 0:00.16 R
36794 gaow 20 0 124092 26276 5364 R 3.0 0.0 0:00.09 R
7904 gaow 20 0 562008 188392 13876 S 2.3 0.3 0:02.29 sos
7910 gaow 20 0 562008 188384 13864 S 2.3 0.3 0:02.26 sos
7912 gaow 20 0 562008 188384 13864 S 2.3 0.3 0:02.23 sos
7913 gaow 20 0 562008 188388 13864 S 2.3 0.3 0:02.33 sos
7916 gaow 20 0 562008 188396 13864 S 2.3 0.3 0:02.26 sos
7917 gaow 20 0 562008 188396 13864 S 2.3 0.3 0:02.29 sos
7918 gaow 20 0 562008 188400 13864 S 2.3 0.3 0:02.28 sos
7919 gaow 20 0 562008 188400 13864 S 2.3 0.3 0:02.29 sos
36799 gaow 20 0 122856 24800 5356 R 2.3 0.0 0:00.07 R
36804 gaow 20 0 120516 22424 5356 R 2.3 0.0 0:00.07 R
7911 gaow 20 0 562008 188384 13864 S 2.0 0.3 0:02.28 sos
7914 gaow 20 0 562008 188388 13864 S 2.0 0.3 0:02.30 sos
7921 gaow 20 0 562008 188404 13864 S 2.0 0.3 0:02.22 sos
7906 gaow 20 0 562008 188380 13864 S 1.7 0.3 0:02.24 sos
7907 gaow 20 0 562008 188380 13864 S 1.7 0.3 0:02.27 sos
7920 gaow 20 0 562008 188396 13864 S 1.7 0.3 0:02.30 sos
7905 gaow 20 0 562008 188376 13864 S 1.3 0.3 0:02.38 sos
7915 gaow 20 0 562008 188392 13864 S 1.3 0.3 0:02.36 sos
so 15min for 50K files on a slow disk system. ...
At least now sos is supposed to build and check signatures in parallel (check sos cpu usage to confirm) and should be faster than before.
At least now sos is supposed to build and check signatures in parallel (check sos cpu usage to confirm) and should be faster than before.
Now meaning ...? Because I see the CPU usage is 1% for 15 minutes. I'm I missing something?
Not sure exactly sos was doing, but 1% was possible if i/o was slow and sos processes were all waiting for io.
but 1% was possible if i/o was slow and sos processes were all waiting for io.
i/o meaning the i/o for hashing, not saving the signature on sqlite? and either way, not something zmq
can help? sorry this is all I want to clarify.
If you are not sure, try to add something like
env.logger.error(f'calculating md5 for {filename} with process {os.getpid()}')
Sure I can try that.
Okay the computation actually was very fast now that we do not keep track of the output. I get:
[MW] ls SuSiE_CS_* | wc -l
72678
but at the end of the job, I get an error message:
ERROR: [SuSiE]: AttributeError: 'NoneType' object has no attribute 'content'
I'll try to make MWE if you are not able to trouble-shoot it blindly. Should be easy to come up with one.
The problem is easy to identify...
Okay let me close this ticket for now -- guess the conclusion is wontfix
for the behavior i expect (which is fair enough). We can use separate tickets to figure out what SoS is doing during job preparation #1024
The reason for me trying local task on #1024 is because non-task was broken (behavior different from before). Here is MWE:
and if I run
you'll see only one threads as if
concurent
was ignored. Of course if my output is like f{_chrom}.feather
it will work multiprocess. It also works withoutoutput
. But this is a script from before so there is a change in behavior here, which, not sure is good change, because my next stepinput
depends on the output. (unless i use dynamic input)