Closed jdhenke closed 11 years ago
My immediate/gut response is that the dynamic aspect seems the next logical step if we're happy with how our static cfg analysis tool is shaping up and we're happy to just hack visuals. It could be as intimate as the statically constructed cfg is passed along with the code to an eval/apply otherwise-normal interpreter, and edges are added to the cfg corresponding to actual calls. And could maybe transition further into profiling/detecting unreachable code?
Otherwise, I don't think I understand how/what "further aggregation of code" or "heuristics" would be useful/add anything substantial - they seem a little tangential? simplifying the graph seems like a good part of visualization, if we were to do that. type checking could be cool, but might require as much changes as provenance tracking? With regards to inferring types? Or were you assuming the code declared types?
I think we need to do documentation regardless, and might as well wait until our code/project is a bit more stable.
I vote for dynamic extension, type checking, or visualization. in that order ;)
The dynamic extension sounds good to me.
I see two different aspects of the problem.
Does this distinction make sense to you guys? How do you feel about be me tackling the second part?
That sounds fine to me - I'd say tack a stab at defining the second part and running with it, then we can see where you end up.
Cool. On it.
After some thought, here are my semi-filtered ideas.
DISCLAIMER: Much of this stuff I recognize as not great, but perhaps it will inspire myself/you guys now or later, so I put it all down.
Static Stuff
Graph Visualization
Here are the options as I seem them.
Further Aggregation of Code
If we could group the code base manually or automatically by file or some other heuristic, could we find the dependencies between groups?
Could we detect conflicts between packages?
Heuristics
Moving into the analysis more, could we identify poor practices via some divined heuristics? *Very open ended and hard to say "OK, I'm done"
Simplify Graph
Much of the concern with displaying a graph is the complexity. Perhaps I could work on simplifying the graph display. For instance, I could work on displaying one function the relevant neighbors. One hop? Two hop?
Type Checking
Could build on Pavel's code, so wouldn't be starting from scratch.
Provenance Tracking
This would be a very large undertaking but could be very cool. Not well defined at this point, as in many design decisions would need to be made, but think of it as tracking flow of information from variables. For instance, annotate some external primitives before, run our code, and see what data was touched by the flow of information from what. Limiting the sources in this graph I think would be good.
Will require many changes at all levels most likely, but relying on well defined interfaces, I think I could work on it now.
Dynamic Stuff
Jumping into runtime, I had some ideas.
Log Inputs and Outputs
Create interface to interact with all the above?
Documentation
Seems lame, but will have to be done at some point. Do you think we'll need to document each subsystem separately? Sussman might like that.. I could develop a quick API doc for my graph stuff. I could also develop it for @oderby cfg API as well. Outline for out entire system?
Logistical note: I'd be very happy to use Github's Wiki functionality for documentation purposes. If you hadn't noticed, GFM isn't my least favorite thing in the world.