Open bsubei opened 9 years ago
See src/exm/stc/ic/opt/ICOptimizer.java for how the optimizer pipeline is constructed. In particular, you'll see that it adds an instance of PruneFunctions() in a couple of places. What happens is that it builds the pipeline with all the passes, then they are skipped based on config settings.
Then look at src/exm/stc/ic/opt/PruneFunctions.java - this is the one that's causing you problems. (Functions are also pruned in the FunctionInline, but that is definitely disabled at O0). The issue is that getConfigEnabledKey() is returning null - there is no config key that can turn off PruneFunctions.
You'll need to add a config key in Settings.java like OPT_PRUNE_FUNCTIONS, add a default value of "true", then add it to get_compiler_opt_name in bin/stc. You can look at a different optimiser pass like FunctionInline.java to see how the options are set up and named.
Once you've done that you should be able to give the -F prune-functions option to turn off function pruning.
Let me know if you hit any blockers here. We use a whole-program compilation model where it just slurps in all the include files and compiles everything at once so I don't think we're going to go to fully separate compilation and linking. But it seems like it should be possible to kludge something so that we can at least smush multiple output files together.
I think the proper solution to the pruning problem is to just have multiple root functions that are always retained. Currently it knows not to touch __entry (or whatever it is), shouldn't be too hard to prevent it from touching any functions in the main file under compilation.
Also, feel free to add in any new command-line options using the -f/-F flags in bin/stc and the config mechanism in Settings.java. This is good if you need to have weird special-case or experimental behaviour conditionally enabled. You could also add the -c flag by modifying bin/stc and using the config mechanism in Settings.java.
I managed to disable pruning pretty easily by doing what you said. I had to run -O0
though, and using the -c
flag alone did not disable pruning. Can you quickly double-check it, particularly this line here?
:D
EDIT:
I just realized, even when I try it with -F function-prune
, it still prunes functions (although the output .tic file comments says pruning is disabled). So it seems that the correct flag is set, but something in the optimizer pipeline is the issue.
Nice work.
Let's remove function-prune from O2_OPTS to O0_opts - we'll just retain the current behaviour of it being on for all optimisation levels.
The issue may be the ordering of flags - if you call disable_opt before set_opt_level() in the bash script then the settings from set_opt_level() will take precedence. You could defer the call to disable_opt until after the final set_opt_level() call.
On 15 July 2015 at 16:21, Basheer Subei notifications@github.com wrote:
I managed to disable pruning https://github.com/basheersubei/swift-t/commit/c29677372e10606165f44f3d3689c079383d1af6 pretty easily by doing what you said. I had to run -O0 though, and using the -c flag alone did not disable pruning. Can you quickly double-check it, particularly this line here https://github.com/basheersubei/swift-t/commit/c29677372e10606165f44f3d3689c079383d1af6#diff-bb2f6beae735a77d1e68424a10b821f1R366 ?
:D
— Reply to this email directly or view it on GitHub https://github.com/swift-lang/swift-t/issues/66#issuecomment-121774540.
ok, I placed function-prune in O0_opts
now. When I run stc -O0 -c f.swift
, it works as expected. However, when I use default -O2
, then the comments say that pruning is disabled, although the end result (.tic file) does not.
It's probably the other optimizations that are somehow enabling or performing pruning, like you said in FunctionInline.java.
EDIT:
confirmed it's specifically function-inline
(I tried stc -c -F function-inline f.swift
and now pruning is actually disabled)
it's right here. Boy, I'm glad you commented your code! :D
Shouldn't we remove function pruning from inside function inlining, since pruning is done separately now? Or is this a different kind of pruning (has a different effect)?
EDIT: it worked when I commented out line 187. Will this have side effects to other things?
Feel free to remove it, it would be nice to reduce the complexity of the code.
I think I added it there because it was easy to do and the PruneFunctions pass didn't exist yet.
On 15 July 2015 at 17:19, Basheer Subei notifications@github.com wrote:
Shouldn't we remove function pruning from inside function inlining, since pruning is done separately now? Or is this a different kind of pruning (has a different effect)?
— Reply to this email directly or view it on GitHub https://github.com/swift-lang/swift-t/issues/66#issuecomment-121787152.
I added a function turbine::get_globals_map in the commit I just pushed to github. This should give you a Tcl dict (https://www.tcl.tk/man/tcl/TclCmd/dict.htm) containing all of the global variables declared.
I just saw your edit. That's fine if you remove that functionality - if you're going to remove it can you complete remove usage of toRemove so that the code is a little cleaner.
I just had some thoughts about how to possibly implement the REPL: I think if you're clever with how you use Turbine's task prioritisation, targeting, and data dependencies, you can do some clever things.
So I think the key thing is that the rank 0 worker is the one that has to run the actual REPL code and interact with the user. You could actually potentially suspend execution of the REPL until data is available by creating a task targeted back to rank 0 (via turbine::rule) that is dependent on the data.
Another thing is that you're going to have to somehow push new compiled code out to workers and ensure that the new code is loaded before those workers try to evaluate any new functions - STC/Turbine freely assume that the same Tcl functions exist on all workers. You could do this by sending out a max priority task, one targeted to each worker. This would for sure run before any new functions. If you wanted to be extra-sure, you could wait to get a confirmation from each worker by creating an adlb variable with write refcount
Anyway, just some thoughts - it may not align with the approach you're taking - but Turbine/ADLB is pretty powerful now if you know some of the tricks.
I removed all the toRemove calls and uses, but I'm not sure about this if-block here. Can you double-check it?
Looks safe to remove to me.
On 19 July 2015 at 21:40, Basheer Subei notifications@github.com wrote:
I removed all the toRemove calls and uses, but I'm not sure about this if-block here https://github.com/basheersubei/swift-t/commit/4375051fce4a5e777a07071597e1466a3750f739#diff-1dba7a6fcfaa433c560d9534050c4c30L275. Can you double-check it?
— Reply to this email directly or view it on GitHub https://github.com/swift-lang/swift-t/issues/66#issuecomment-122758487.
The new globals_map looks clear enough for me to use. I'll just have to deal with the optimization for now (my REPL works for O2, not for O0. And O2 removes most of the globals for basic scripts). Once I fix that, I'll be able to use the globals_map.
The REPL breaks for O0?
yes, only because of the way I currently grep and remove the boilerplate tic code.
I'm adding a new compiler setting so that the -c
flag omits generating the boilerplate tic code (hopefully, this should work regardless of optimization levels). I think this is a much better way than grepping and removing the boilerplate tic code.
user inputs swift code --> REPL runs it through stc normally --> greps through generated tic code and removes boilerplate (turbine::start
, turbine::finalize
, turbine::defaults
, etc) --> REPL evals the tic code.
user inputs swift code --> REPL runs it through stc with -c
flag --> tic generated without boilerplate --> REPL evals the tic code
that previous toRemove commit I made had a bug in it (that's what I get for not using an IDE). I fixed it and pushed it again. This time I actually ran the tests and checked it.
It's worth learning Eclipse - it's a huge time saver with Java. I still use Vim for the C and Tcl.
On 20 July 2015 at 17:06, Basheer Subei notifications@github.com wrote:
that previous toRemove commit I made had a bug in it (that's what I get for not using an IDE). I fixed it and pushed it again. This time I actually ran the tests and checked it.
— Reply to this email directly or view it on GitHub https://github.com/swift-lang/swift-t/issues/66#issuecomment-123101177.
I modified STC so it doesn't generate the boilerplate tic code when a -c
flag is passed. It does exactly what I want (omits exactly the lines I wanted from the tic), but when I eval that code (within the REPL loop inside a leaf function in Turbine), it works but only when it's not -O0
.
I've noticed very strange behavior when the REPL tries to eval the tic code (can I call it .tric code, stands for Turbine-REPL intermediate code? xD). For example, when I use -O0
, it never terminates.
Maybe I have a fundamental misunderstanding with how the whole runtime works. All this REPL code I'm running (that evals
a bunch of modified tic code) is sitting in a Swift leaf function. That means it's inside a Turbine worker, right? Can a worker make all these Turbine calls (eval
ing tic code)?
I think it's a little tricky to get this to behave right.
The way the runtime works is that it invokes mpiexec, which invokes n copies of the Tcl interpreter, each running the .tic file. In MPI each process has a rank from 0 to n-1. There's a setup stage where everything gets initialized and everyone agrees on which rank is doing what. Rank 0 is always a worker and rank n-1 is always an ADLB server, but in-between ranks can be allocated in different ways.
After the setup stage, the ADLB servers start running their server loops, the rank 0 worker starts running swift:main and the other workers wait to receive a task.
The ADLB servers will stay in the server loop until the whole run shuts down, and the workers will sit in their own loops executing a task (i.e. a fragment of Tcl code that calls a function with some arguments), then requesting another task, and so on. So yeah, workers are the only things that can execute Tcl code.
I think one thing you need to do is make sure that the REPL code is executed on rank 0. I believe MPI generally routes STDIN to rank 0 (http://www.mcs.anl.gov/research/projects/mpi/mpich1-old/docs/mpichman-chp4mpd/node13.htm). So if your REPL executes on, say, rank 1, it will never get any input. I suspect this probably has something to do with your O0 problems.
There might also be some trickiness with evaluating the modified .tic code - you probably want to evaluate it in the global scope in all functions. There are ways in Tcl to control the scope in which code is evaled. I believe you want something like uplevel #0 $the_command
Each worker should be able to execute any Turbine code. They're symmetrical aside from the fact that rank 0 starts execution and that MPI sends input to rank 0. The exception is if you have special executors (like Coasters) which are only on certain worker ranks.
Well, the thing is, my REPL does get user input, and it compiles it in STC just fine. But when it eval
s the tic code, it just hangs (I notice 100% CPU usage for that process). I narrowed down the problem to the adlb::create_globals
call. It seems this function can't be called after startup (like by the REPL)?
Ohhhhh, I think I get it! Since create_globals
has to be called collectively by all ranks, and since this eval
ed tic code only runs on a single worker (the REPL), it sits there waiting for the other ranks to call it. (I can even see a barrier call)
Well, I'm completely stumped. There is no way to make the other ranks call create_globals with the same data. And we can't remove the collective nature of create_globals, since all the ranks must have the same globals (that's the reason for the collective call, right?)
Oh right... yeah create_globals as-is won't work. I think there are multiple possible solutions, might require some thought about which is best:
I think you could probably do something with a pair of functions like:
proc load_tic_master { tic_file } {
(execute tic_file) # this would create new global vars via adlb::create
set new_globals <some way to get new globals just created>
for each other worker {
(send a targeted task to worker with payload) "load_tic_slave (tic_file) (new_globals)"
}
adlb::worker_barrier # wait until this has executed on all workers
}
proc load_tic_slave { tic_file globals } {
(execute tic_file) # this would create new global vars via adlb::create
(load new globals)
adlb::worker_barrier
}
(phew) I thought there was no way around this problem... I think I generally understand where to go from here. But a few questions:
1- if we can create "global" variables using adlb::create
, then what's the difference between regular variables and global variables? Is it just that globals are all created at startup collectively, and we don't need to do any extra work to let all ranks know about them (such as what you're proposing)? Can we set a permanent flag so they're not garbage collected?
2- If the value of a global variable is changed by one rank, how do the other ranks find this out? (I'm assuming it's stored on one ADLB server) Or is this use case not relevant?
3- We don't need to worry about servers, right? since they won't ever be running any code (they won't need variables or function definitions).
4- I would need to modify STC -c
to use adlb::create
or adlb:multicreate
instead of adlb:create_globals
, right? This relates back to question1. What are the actual differences between variables and globals?
I'm probably going to have tons of questions about how to code certain parts of these, but I'll ask them as I get there. Thanks for all the help! :)
All good questions.
<type> [<extra for type>] [ <read_refcount> [ <write_refcount> [ <permanent> ] ] ]
To create an int with default properties (1r, 1w, not permanent), the args would be: integer 1 1 0
If you change the last arg to 1 it will be permanent.
a) how IDs are allocated (collectively between all workers versus created on-demand). The negative ID space is allocated from -1 downwards whenever create_global is called by all ranks. The positive ID space is divided between servers round-robin and then each server can allocate its next ID to a caller without any coordination with other servers. b) globals are always permanent
Looking at the generated .tic code, it seems the args used in multicreate are different:
lassign [ adlb::multicreate [ list string 1 1 1 ] [ list string 1 1 2 ] [ list string 1 1 3 ] [ list string 1 1 4 ] [ list integer 1 1 5 ] [ list string 1 1 6 ] ] t:1 t:2 t:3 t:4 t:5 t:6
Upon taking a closer look at extract_create_props
, which extracts the Tcl properties from the adlb::multicreate
and adlb::create
calls, I can see that it grabs the permanent
property from the 4th argument. The 3rd argument is called symbol
. That change was made around 2 months ago. Am I missing something? I can't figure out how to actually set the permanent flag for variables now.
You're right, I was looking at the wrong version of the code - permanent was bumped to being an optional argument after symbol.
On 21 July 2015 at 15:21, Basheer Subei notifications@github.com wrote:
Looking at the generated .tic code, it seems the args used in multicreate are different: lassign [ adlb::multicreate [ list string 1 1 1 ] [ list string 1 1 2 ] [ list string 1 1 3 ] [ list string 1 1 4 ] [ list integer 1 1 5 ] [ list string 1 1 6 ] ] t:1 t:2 t:3 t:4 t:5 t:6
Upon taking a closer look at extract_create_props https://github.com/swift-lang/swift-t/blob/master/turbine/code/src/tcl/adlb/tcl-adlb.c#L1370, which extracts the Tcl properties from the adlb::multicreate and adlb::create calls, I can see that it grabs the permanent property from the 4th argument. The 3rd argument is called symbol. That change was made around 2 months ago. I can't figure out how to actually set the permanent flag for variables now.
— Reply to this email directly or view it on GitHub https://github.com/swift-lang/swift-t/issues/66#issuecomment-123496689.
You were right! I do need to use uplevel #0
to eval my tic! I was getting a lot of problems with global variables no such variable
because the global
keyword in the tic behaves differently when eval
ing in the current scope! This solved it!
I can't figure out how to get STC to add the permanent arg in multicreate. It's in batchDeclare
in TurbineGenerator.java
, but the data type that holds the args VarDecl
only contains the name, the initReaders, and initWriters (for refcounts). I'll ignore this for now and work on properly creating variables and pushing them out to other workers.
not sure if it's a bug, but it seems that even adlb::create_globals
calls don't actually set the permanent
arg. Here's what STC generates now: lassign [ adlb::create_globals [ list integer 1 1 11 ] ] u:x
.
There's no permanent flag set. Is this a bug? (probably not since everything works)
Yeah it isn't set in the generated code, but in the C implementation of create_globals, the default value for permanent is set to true.
https://github.com/swift-lang/swift-t/blob/master/turbine/code/src/tcl/adlb/tcl-adlb.c#L1509
Just saw the earlier comment. It looks like you may need to add an additional permanent argument here: private TclList createArgs(Var var, Arg initReaders, Arg initWriters) {
I would probably have it allowed to be null and only add the extra arg on if permanent is non-null. This is just to avoid increasing the size of output code with extra parameters when not needed.
That part of the code generator is a bit messy - we intend to keep the syntactic details in Turbine.java and the higher-level logic in TurbineGenerator.java but it's not separated well here.
There's no need to pass the globals around explicitly as part of the task. The global declarations are in the tic anyways. I'm thinking something like this:
proc loadTic { tic_code } {
# eval tic code (now globals and proc defs are available to this rank)
# if rank 0 (only REPL loop)
# for all workers
# send targeted task to worker with payload "loadTic tic_code"
# adlb::worker_barrier
}
# rank 0 (REPL loop) compiles user input into tic (only contains
# proc definitions and global var assignments). Then, it calls loadTic
# on it. This spawns tasks for all other workers, so they all have the variable
# and proc definitions (nothing executed, though, on any of the ranks). Once they
# all reach the barrier, rank 0 (REPL) executes swift:main and carries on with
# any function calls, then listens to more input.
# The only problem is: user can no longer interactively start tasks and then start
# more until the other tasks (workers) are finished, unless we set the loadTic targeted
# tasks to a very high priority (in which case, the definitions from older tasks, such as
# swift:main or globals may be overwritten).
I guess we'll just assume that the user won't interactively redefine any functions or globals while a task (using those definitions) is running on a worker. Also, we can rename swift:main
as a last resort fix.
Would it make sense to append a special prefix to functions generated from the REPL so that they're unique? IIRC the Tcl function name generation is done in a single spot.
I think the background tasks idea is cool but probably quite a bit of work beyond this problem.
I think sending the tasks out maximum priority is a good idea regardless. If you did that and omitted the barrier I think it would work ok in 99.99+% of cases since in the current implementation max-priority targeted tasks will be at the head of the targeted worker's queue as soon as the task creation returns to the caller. We don't guarantee this in the API.
For the issue with the permanent arg, I'll work on that in a separate issue #67
the unique prefix sounds like a good and easy solution.
I was under the impression that the main purpose of a REPL or interactive Swift/T was so that a user could interactively issue multiple tasks and have them run in the background as they monitor and probe the variable states etc. I guess we'll find out soon enough whether it's feasible or not. Regardless, I'll be working towards that until then.
For the max priority task idea: max priority tasks (even if at the head of the queue) cannot stop what the worker is doing, right? Is there a way to notify the worker to stop the current task and move it to the back of the queue (or does it periodically poll the servers for such information)?
If not, then I guess we could just do away with the barrier, like you said. 99.99% sounds good enough for me (and for whatever basic use cases that apply to interactive Swift scripting).
You're right, having lots of background tasks would be very useful. Let's see how this solution works out, I think there are other possibilities too if that doesn't work out.
There's no way to interrupt a worker, no, since the task is just arbitrary code. The standard worker loop is here, actually: https://github.com/swift-lang/swift-t/blob/master/turbine/code/src/turbine/worker.c#L50 - it just alternates between ADLB_Get and Tcl_Eval.
It works! The REPL can actually run swift code that spawns tasks on other workers! :D
Here's a copy-paste of a shell session: https://gist.github.com/basheersubei/7b6bf9e5cb129eb8ebdb https://gist.github.com/basheersubei/76eae997d3783fac280e
But I messed up on the global variables thing. I misunderstood how it works, and now it creates a copy of the variables in each worker (and you end up with multiple copies). I think I'll make a modified version of create_globals
that can only be called by rank 0, and which uses your globals_map to register all the global variables in that Tcl dict. This way, I won't have to worry about the permanent arg in multicreate
, and I won't have to worry about passing the global variable ids from the eval'd tic code on rank 0 to the other ranks.
Awesome work :)
Also, the second link I posted above shows some errors related to unfilled subscribes (also a bunch of nulls where the variables are supposed to be in the work queue). Any idea what that unfilled subscribes could be related to?
The nulls are the usual behaviour when there isn't a debug symbol for the variable. It would be nice to have cleaner output for that case but that's how it works right now.
The //'s are used to indicate which variable the task was waiting on, e.g. this one is waiting on ID <20> [7] {6} printf (<48>:(null) ((null)) <46>:(null) ((null)) /<20>:(null) ((null))/)
The unfilled subscribe messages are closely related: subscribes were made for <20> and <30> (either a task was dependent on those IDs, or someone manually called ADLB_Subscribe for some reason), but the write refcount of <20> and <30> never dropped to zero so a notification was never sent out and the subscribe was "unfilled".
So it looks like x, the third argument to printf, was never assigned.
hmm... sounds like it's caused by my multicreate problem. I'll work on it next and see if it changes the results.
I may have found a solution for being able to compile Swift scripts into "object" files and then "linking" them, at least for global variables in multiple scripts.
The nice thing about global variables, is that STC generates the same names for them in the tic code. For example, every script that declares a global called x
will generate tic code that uses the name u:x
to reference that variable.
Example solution:
x
. STC compiles into tic as usual.x
using extern
keyword (to indicate that its definition is to come from another script). This means that STC will not generate the declare_globals
and turbine::store*
calls for that variable (already done by another script).This seems to work fine in the REPL use case, as we can implicitly add all global variable declarations (using extern) to every line of user input in the REPL.
For example (given the above solution):
int x = f();
, and turbine goes on and executes f
(say it takes a while).g(x);
before/after the first command is done. The REPL prepends a line extern int x;
, and it compiles in STC and runs in turbine.The one thing I'm worried about is adding the extern
keyword to STC. I can't even get STC to ignore undefined global variable
errors, let alone modify the language itself.
Summary
It would be great if STC could compile separate Swift script fragments and sort of "link" them together and run it. Example: f.swift contains definition for function f(), while main.swift calls f() and prints result to screen. We would like STC to be able to compile f.swift and main.swift separately into f.tic and main.tic (where f.tic doesn't have any turbine startup or shutdown boilerplate code, since it's only a function definition), and then being able to run main.tic (which calls f() defined in f.tic).
Purpose
Potential Issues
__entry:if1
). Does this also happen when STC optimization is on?Current Tasks
-c
flag so that it modifies the turbine code generating (essentially removing the boilerplate code)-O0
no optimizations, as well.adlb::create_globals
won't work since it's a collective call.extern
keyword functionality to ANTLR grammar parser and STC code generator.extern
declared variables.