vatlab / sos

SoS workflow system for daily data analysis
http://vatlab.github.io/sos-docs
BSD 3-Clause "New" or "Revised" License
274 stars 45 forks source link

Concurrent tasks with dynamic output #1028

Closed gaow closed 6 years ago

gaow commented 6 years ago

The reason for me trying local task on #1024 is because non-task was broken (behavior different from before). Here is MWE:

[1]
chroms = [f'chr{x+1}' for x in range(22)]
input: for_each = 'chroms', concurrent = True
output: dynamic('*.feather')
bash:
  stress --cpu 1
  touch test.feather

and if I run

sos run test.sos -j8

you'll see only one threads as if concurent was ignored. Of course if my output is like f{_chrom}.feather it will work multiprocess. It also works without output. But this is a script from before so there is a change in behavior here, which, not sure is good change, because my next step input depends on the output. (unless i use dynamic input)

BoPeng commented 6 years ago

https://github.com/vatlab/SoS/blob/master/src/sos/step_executor.py#L1520

Concurrent execution is disabled for dynamic output because currently concurrent executor does not return _output to the step so there is no way for the step to create step_output from substeps. There might be deeper reason for this limitation but right now I think this is something that can be fixed.

BoPeng commented 6 years ago

A clearer example:

[1: shared={'step1': 'step_output'}]
input: for_each={'i': range(10)}, concurrent=True
output: dynamic(f'rep_{i}/*.res')

import random
path(f'rep_{i}/{random.randint(0, 10000)}.res').touch()

[2]
print(step1)
  1. The dynamic output is replicate dependent so it has to be collected for each substep.
  2. step_output needs to be aggregated from these dynamic output (and shared)
  3. When rerun, the signature of each substep will need to check the existence of the result last time, and bypass the substep if the output already exist.

Neither substep signature or concurrency works now. concurrency=True is ignored intentionally because sos currently does not return _output from concurrent process so it has to be obtained before the substeps are farmed to the queue.

BoPeng commented 6 years ago

Not done yet, some tests still fail, and there are uncertainties on the handling of shared variables, signatures with dynamic output, parallelization etc.

BoPeng commented 6 years ago

Supposed to be fixed. Close for now.

gaow commented 6 years ago

Not sure about this, but rerunning one of my old workflow I see something scary like this:

2018-09-02-10-01-46_scrot

it is an R analysis involving dynamic output, but I see sos all over the place. I can hardly see any R instance showing up. Looks like most resource spent here are just overhead. I do not recall seeing something like this before. This is local non-task jobs.

BoPeng commented 6 years ago

With concurrent=True? -v4 should tell you the number of slots allocated.

gaow commented 6 years ago

This is with concurrent True but I used j 24 so this is expected to use these slots. My point is that all my tasks should be heavy duty R command (heavy duty is relative ... maybe 1min per R analysis) so I'm expecting to see top shows mostly R processes instead of sos, an indication of even more heavy duty overhead going on.

gaow commented 6 years ago

If it is related to sqlite overhead I can wait for the zmq branch to work so we do not spend energy investigating this problem.

gaow commented 6 years ago

Okay, I've updated to the latest master and reran a job from yesterday. The workflow looks like this:

[SuSiE (SuSiE analysis)]
depends: R_library('susieR')
# SuSiE parameter: L
parameter: maxL = 5
# SuSiE parameter: prior variance; set to 0 to use per-dataset PVE estimates
parameter: prior_var = 0.0
input: glob.glob(f'{cwd}/{y_data:bnn}_{int(max_dist/1000)}Kb/chr*/*.rds'), group_by = 1, concurrent = True
output: dynamic(glob.glob(f'{cwd}/{y_data:bnn}_{int(max_dist/1000)}Kb/SuSiE_CS_*/*.rds'))
R: expand = "${ }"

The input is about 50K files, the output is the same number of files but into different folders depending on analysis results. What happened was:

  1. Single sos process, waited for at least 1.5hrs (the last time I checked top), seems to be preparing for jobs
  2. Same as the screen-shot above: all SoS processes, no R command showing up. These SoS processes are not using a lot CPU (3%).

Since it is a renewal of job from yesterday, I do have a number of ... is ignored due to saved signature. But it only ignores 75 jobs. The actual output files I have are 16.8K (still far from 50K though after 12hrs running yesterday but according to notes in my notebook it used to complete 50K jobs in 3hrs).

I guess it is related to sqlite, but not sure how zmq can help with it -- maybe loading database to memory will also help, though it is likely not salable?

BoPeng commented 6 years ago

I think it is still the case that with dynamic output you would have larger and larger _output that leads to larger and larger signatures, right?

gaow commented 6 years ago

Oh because we have discussed it in #1024 I thought checking glob at each substep is already disallowed for dynamic output and the default is to mimic -s ignore behavior for substep then check and write signatures at the end? If not, I'd not be surprised for the slowness. ...

BoPeng commented 6 years ago

Using something like

[SuSiE (SuSiE analysis)]
input: dynamic(f'input/*.rds'), group_by = 1, concurrent = True
output: dynamic(f'output/*.rds')

import random
output_file = f'output/{random.randint(0, 1000000)}.rds'

R: expand = True
   Sys.sleep(20)
   write({_index}, '{output_file}')

I see

1985961928 97544  6976   0  3:45PM ttys000    0:01.78 /Users/bpeng1/anaconda3/envs/sos/bin/python /Users/bpeng1/anaconda3/envs/sos/bin/sos run test -j8
1985961928 97547 97544   0  3:45PM ttys000    0:00.22 /Users/bpeng1/anaconda3/envs/sos/bin/python /Users/bpeng1/anaconda3/envs/sos/bin/sos run test -j8
1985961928 97548 97547   0  3:45PM ttys000    0:00.03 /Users/bpeng1/anaconda3/envs/sos/bin/python /Users/bpeng1/anaconda3/envs/sos/bin/sos run test -j8
1985961928 97549 97547   0  3:45PM ttys000    0:00.03 /Users/bpeng1/anaconda3/envs/sos/bin/python /Users/bpeng1/anaconda3/envs/sos/bin/sos run test -j8
1985961928 97550 97547   0  3:45PM ttys000    0:00.03 /Users/bpeng1/anaconda3/envs/sos/bin/python /Users/bpeng1/anaconda3/envs/sos/bin/sos run test -j8
1985961928 97551 97547   0  3:45PM ttys000    0:00.03 /Users/bpeng1/anaconda3/envs/sos/bin/python /Users/bpeng1/anaconda3/envs/sos/bin/sos run test -j8
1985961928 97552 97547   0  3:45PM ttys000    0:00.03 /Users/bpeng1/anaconda3/envs/sos/bin/python /Users/bpeng1/anaconda3/envs/sos/bin/sos run test -j8
1985961928 97553 97547   0  3:45PM ttys000    0:00.03 /Users/bpeng1/anaconda3/envs/sos/bin/python /Users/bpeng1/anaconda3/envs/sos/bin/sos run test -j8
1985961928 97554 97547   0  3:45PM ttys000    0:00.03 /Users/bpeng1/anaconda3/envs/sos/bin/python /Users/bpeng1/anaconda3/envs/sos/bin/sos run test -j8
1985961928 97555 97547   0  3:45PM ttys000    0:00.03 /Users/bpeng1/anaconda3/envs/sos/bin/python /Users/bpeng1/anaconda3/envs/sos/bin/sos run test -j8
1985961928 97556 97553   0  3:45PM ttys000    0:00.36 /Users/bpeng1/anaconda3/envs/sos/lib/R/bin/exec/R --slave --no-restore --file=/var/folders/ys/gnzk0qbx5wbdgm531v82xxljv5yqy8/T/tmpfov60v0f.R
1985961928 97557 97548   0  3:45PM ttys000    0:00.37 /Users/bpeng1/anaconda3/envs/sos/lib/R/bin/exec/R --slave --no-restore --file=/var/folders/ys/gnzk0qbx5wbdgm531v82xxljv5yqy8/T/tmp649w2a56.R
1985961928 97558 97555   0  3:45PM ttys000    0:00.36 /Users/bpeng1/anaconda3/envs/sos/lib/R/bin/exec/R --slave --no-restore --file=/var/folders/ys/gnzk0qbx5wbdgm531v82xxljv5yqy8/T/tmp59jhrkep.R
1985961928 97559 97550   0  3:45PM ttys000    0:00.36 /Users/bpeng1/anaconda3/envs/sos/lib/R/bin/exec/R --slave --no-restore --file=/var/folders/ys/gnzk0qbx5wbdgm531v82xxljv5yqy8/T/tmpmcpu0bxx.R
1985961928 97560 97549   0  3:45PM ttys000    0:00.36 /Users/bpeng1/anaconda3/envs/sos/lib/R/bin/exec/R --slave --no-restore --file=/var/folders/ys/gnzk0qbx5wbdgm531v82xxljv5yqy8/T/tmphibs2swo.R
1985961928 97561 97551   0  3:45PM ttys000    0:00.37 /Users/bpeng1/anaconda3/envs/sos/lib/R/bin/exec/R --slave --no-restore --file=/var/folders/ys/gnzk0qbx5wbdgm531v82xxljv5yqy8/T/tmp1h3j32qb.R
1985961928 97562 97554   0  3:45PM ttys000    0:00.36 /Users/bpeng1/anaconda3/envs/sos/lib/R/bin/exec/R --slave --no-restore --file=/var/folders/ys/gnzk0qbx5wbdgm531v82xxljv5yqy8/T/tmpv3kaq9ak.R
1985961928 97563 97552   0  3:45PM ttys000    0:00.36 /Users/bpeng1/anaconda3/envs/sos/lib/R/bin/exec/R --slave --no-restore --file=/var/folders/ys/gnzk0qbx5wbdgm531v82xxljv5yqy8/T/tmp0q2f59fn.R

which is more or less correct with -j8

BoPeng commented 6 years ago

sos still allows it and handles it more "correctly" now, because glob can be substep specific, as the example here.

Because the new step_output is the collection of _output without duplicates, I can almost optimize this case by removing outputs that has been seen before. In the concurrent=True case each process will get non-consecutive substeps but still the majority of outputs would be removed from individual _output if we do this for each worker.

gaow commented 6 years ago

well, I have 50K files to start from and I'm on a network system. I think figuring out * for 50K files took a while here. Not sure how many on your end for input files, but if as you said each substep is still checking all available dynamic output files I'm not surprised it is slow. Now trying it again with -s ignore to see if that works better.

gaow commented 6 years ago

I can almost optimize this case by removing outputs that has been seen before.

Well, it may help saving signature checking burden, but not the glob.glob procedure to look for fit files for every substep. I think that actually hurts. I wonder if such checks makes sense in the context concurent though.

BoPeng commented 6 years ago

I agree, glob.glob a directory of 50k files repeatedly will not be a pleasant experience.

BoPeng commented 6 years ago

So the best way is still to derive exact _output from _input, even if you can only limit the search with information from _input.

BoPeng commented 6 years ago

Or you could ignore the output specification altogether? If the next step needs it, you can glob.glob there as input.

gaow commented 6 years ago

So the best way is still to derive exact _output from _input

I agree. I'll think about my procedure. I did not derive because I want to save output to different folders based on their strength of association signal (only to be known after analysis) -- that case i do not have to post-process the results. So it is basically unlimited check on up to 50K files repeatedly.

I'm currently adding back -s ignore for this special case where all the other recommendations you suggest will not work. It seems to have improved it:

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                 
22039 gaow      20   0  589148 219444  13872 R  47.2  0.3   0:21.54 sos                                                                     
22035 gaow      20   0  585052 215508  13872 S  46.8  0.3   0:21.20 sos                                                                     
22034 gaow      20   0  584796 215252  13872 S  46.5  0.3   0:21.14 sos                                                                     
22028 gaow      20   0  583772 214228  13872 S  44.2  0.3   0:21.75 sos                                                                     
22029 gaow      20   0  588892 219088  13872 R  43.2  0.3   0:20.62 sos                                                                     
22036 gaow      20   0  583772 214228  13872 S  42.9  0.3   0:21.85 sos                                                                     
22030 gaow      20   0  584540 214996  13872 S  41.5  0.3   0:21.07 sos                                                                     
22038 gaow      20   0  587100 217248  13872 R  40.2  0.3   0:21.09 sos                                                                     
22032 gaow      20   0  583772 214228  13872 S  39.5  0.3   0:20.06 sos                                                                     
22033 gaow      20   0  584028 214484  13872 S  39.5  0.3   0:21.58 sos                                                                     
22040 gaow      20   0  584284 214484  13872 R  39.5  0.3   0:21.18 sos                                                                     
22037 gaow      20   0  583772 214228  13872 S  39.2  0.3   0:21.62 sos                                                                     
22026 gaow      20   0  585308 215764  13872 R  38.5  0.3   0:21.07 sos                                                                     
22027 gaow      20   0  585308 215764  13872 R  37.9  0.3   0:21.28 sos                                                                     
22025 gaow      20   0  584028 214500  13884 R  35.5  0.3   0:20.88 sos                                                                     
22031 gaow      20   0  584028 214484  13872 S  34.9  0.3   0:20.72 sos                                                                     
 5851 root       0 -20 18.630g 4.483g 201424 S  27.2  7.2   3185:21 mmfsd                                                                   
29813 gaow      20   0  246232  98524   6120 D  11.0  0.2   0:00.33 R                                                                       
29839 gaow      20   0       0      0      0 R  11.0  0.0   0:00.33 R                                                                       
22024 gaow      20   0 1006920 491116  14004 S  10.6  0.8   0:40.94 sos                                                                     
29860 gaow      20   0  236548  89056   6136 R  10.0  0.1   0:00.30 R                                                                       
29866 gaow      20   0  172804  57676   5776 R   8.6  0.1   0:00.26 R                                                                       
29874 gaow      20   0  160460  49636   5712 R   7.3  0.1   0:00.22 R                                                                       
29886 gaow      20   0  137016  39172   5388 R   3.7  0.1   0:00.11 R                                                                       
29899 gaow      20   0  109912  13944   5280 R   1.3  0.0   0:00.04 R                                                                       
29900 gaow      20   0  108808  12364   4756 R   1.3  0.0   0:00.04 R  

At least now you see some R processes going on, though along with lots of sos still. But now you see the sos process uses a lot more CPUs, and R still uses very limited CPU resource compared to the overhead.

Or you could ignore the output specification altogether? If the next step needs it, you can glob.glob there as input.

I think this is what -s ignore essentially do?

BoPeng commented 6 years ago

No. sos can still keeps signatures and skip steps, just with empty _output.

gaow commented 6 years ago

Okay I see! Thanks for the clarification. Let me rerun my analysis that way and show the top output.

BTW should we be worried that verifying 50K input in

input: f'{cwd}/{y_data:bnn}_{int(max_dist/1000)}Kb/chr*/*.rds', group_by = 1, concurrent = True

takes very long time (maybe on the network file system)?

BoPeng commented 6 years ago

yeah, glob itself takes a bit of time, and sos tries to read the content and hash it, that will take a much longer while, so -s ignore might still be the way to go..

gaow commented 6 years ago

yeah, glob itself takes a bit of time, and sos tries to read the content and hash it, that will take a much longer while

in a separate test, glob itself in this case takes half a second, I get:

>>> len(files)
44068

The that one sos process now takes 15 min after I remove output. But it is very low cpu usage:

 7488 gaow      20   0  406756 112132  15056 D   1.0  0.2   0:21.14 sos                                                                     

Using only 1.3% CPU. It looks like a sqlite bottleneck.

Then after 15min, I still get many R and sos processes going on at the same time, but R certainly now dominate it:

36609 gaow      20   0  261248 113792   6132 D  18.9  0.2   0:00.57 R                                                                       
36635 gaow      20   0       0      0      0 R  17.5  0.0   0:00.53 R                                                                       
36663 gaow      20   0       0      0      0 R  15.2  0.0   0:00.46 R                                                                       
36705 gaow      20   0  260648 113196   6132 D  14.2  0.2   0:00.43 R                                                                       
 7901 gaow      20   0  998888 485920  14004 S  12.3  0.7   0:48.67 sos                                                                     
36724 gaow      20   0  231160  83704   6136 D  11.6  0.1   0:00.35 R                                                                       
36736 gaow      20   0  237852  90000   6044 R  11.3  0.1   0:00.34 R                                                                       
36738 gaow      20   0  219628  71812   6044 R   9.9  0.1   0:00.30 R                                                                       
36756 gaow      20   0  160608  49772   5712 R   8.3  0.1   0:00.25 R                                                                       
36764 gaow      20   0  157932  47148   5712 R   7.6  0.1   0:00.23 R                                                                       
36769 gaow      20   0  152764  41884   5688 R   6.6  0.1   0:00.20 R                                                                       
36781 gaow      20   0  137620  37820   5416 R   5.3  0.1   0:00.16 R                                                                       
36794 gaow      20   0  124092  26276   5364 R   3.0  0.0   0:00.09 R                                                                       
 7904 gaow      20   0  562008 188392  13876 S   2.3  0.3   0:02.29 sos                                                                     
 7910 gaow      20   0  562008 188384  13864 S   2.3  0.3   0:02.26 sos                                                                     
 7912 gaow      20   0  562008 188384  13864 S   2.3  0.3   0:02.23 sos                                                                     
 7913 gaow      20   0  562008 188388  13864 S   2.3  0.3   0:02.33 sos                                                                     
 7916 gaow      20   0  562008 188396  13864 S   2.3  0.3   0:02.26 sos                                                                     
 7917 gaow      20   0  562008 188396  13864 S   2.3  0.3   0:02.29 sos                                                                     
 7918 gaow      20   0  562008 188400  13864 S   2.3  0.3   0:02.28 sos                                                                     
 7919 gaow      20   0  562008 188400  13864 S   2.3  0.3   0:02.29 sos                                                                     
36799 gaow      20   0  122856  24800   5356 R   2.3  0.0   0:00.07 R                                                                       
36804 gaow      20   0  120516  22424   5356 R   2.3  0.0   0:00.07 R                                                                       
 7911 gaow      20   0  562008 188384  13864 S   2.0  0.3   0:02.28 sos                                                                     
 7914 gaow      20   0  562008 188388  13864 S   2.0  0.3   0:02.30 sos                                                                     
 7921 gaow      20   0  562008 188404  13864 S   2.0  0.3   0:02.22 sos                                                                     
 7906 gaow      20   0  562008 188380  13864 S   1.7  0.3   0:02.24 sos                                                                     
 7907 gaow      20   0  562008 188380  13864 S   1.7  0.3   0:02.27 sos                                                                     
 7920 gaow      20   0  562008 188396  13864 S   1.7  0.3   0:02.30 sos                                                                     
 7905 gaow      20   0  562008 188376  13864 S   1.3  0.3   0:02.38 sos                                                                     
 7915 gaow      20   0  562008 188392  13864 S   1.3  0.3   0:02.36 sos  

so 15min for 50K files on a slow disk system. ...

BoPeng commented 6 years ago

At least now sos is supposed to build and check signatures in parallel (check sos cpu usage to confirm) and should be faster than before.

gaow commented 6 years ago

At least now sos is supposed to build and check signatures in parallel (check sos cpu usage to confirm) and should be faster than before.

Now meaning ...? Because I see the CPU usage is 1% for 15 minutes. I'm I missing something?

BoPeng commented 6 years ago

Not sure exactly sos was doing, but 1% was possible if i/o was slow and sos processes were all waiting for io.

gaow commented 6 years ago

but 1% was possible if i/o was slow and sos processes were all waiting for io.

i/o meaning the i/o for hashing, not saving the signature on sqlite? and either way, not something zmq can help? sorry this is all I want to clarify.

BoPeng commented 6 years ago

If you are not sure, try to add something like

env.logger.error(f'calculating md5 for {filename} with process {os.getpid()}')

here

gaow commented 6 years ago

Sure I can try that.

Okay the computation actually was very fast now that we do not keep track of the output. I get:

[MW] ls SuSiE_CS_* | wc -l
72678

but at the end of the job, I get an error message:

ERROR: [SuSiE]: AttributeError: 'NoneType' object has no attribute 'content'

I'll try to make MWE if you are not able to trouble-shoot it blindly. Should be easy to come up with one.

BoPeng commented 6 years ago

The problem is easy to identify...

gaow commented 6 years ago

Okay let me close this ticket for now -- guess the conclusion is wontfix for the behavior i expect (which is fair enough). We can use separate tickets to figure out what SoS is doing during job preparation #1024