omphalos / inductive.js

Inductive programming in JavaScript
MIT License
25 stars 2 forks source link

breadth first search ? #2

Open UniqueFool opened 8 years ago

UniqueFool commented 8 years ago

In the docs you said inductive.js is running a BFS (breadth-first search) - and I checked the solver and was wondering if there is any kind of scheme, or even just idea, to run a parallel search to use multiple cores, i.e. using something like parallel.js to run a bidirectional search using two threads,
what do you think ?

I have some non-trivial specs to solve and would be interested to learn about leveraging multicore hardware - for instance, I have access to a 12-core system with 48 gb of RAM and would like to use that hardware to help speed up the solver.

omphalos commented 8 years ago

It's possible that inductive.js could use parallel search. However my motivation for this library hasn't been to create something that people would have to run for a long time. In general, long-running solves, like long-running compilation, lead to an unpleasant development experience, which I'm trying to avoid. My idea was that people could create small, quickly-solved bits and compose them. The main benefit I felt wasn't so much coming up with programs the developer never thought about so much as tightly tying code to tests, in a tighter way than is done by conventional TDD. (Additionally it kind of nudges the developer into a functional style, which also leads to fewer bugs.)

Another problem with long-running programs is that the set of building blocks isn't always easy to come up with. (This probably speaks to a lack of adequate debugging tools.) It's possible when you're using inductive.js that you think you have all the building blocks needed to solve a spec, but you don't, and running a long-running solve (parallel or not) leads you waiting a long time only to discover you don't have an answer. Personally I prefer feedback that is immediate or almost immediate.

So generally, my opinion is that all solves should occur in under a second. Inductive.js does caching so re-running your test suite on consecutive solves should happen very fast.

There are some rudimentary debugging tools that inductive.js has that can help you debug specs that don't solve. If you're interested I can give you a run-down of some of those.

Alternatively maybe you could give me a description of some of your bigger specs in case I'm not understanding the problem adequately.

UniqueFool commented 8 years ago

Thank you, regarding your last point: I am currently using GP (genetic programming) to evolve neural networks (NNs) - the network setup is pretty arbitrary actually, and I merely "reward" networks that help solve a specific problem. For starters, I am using a handful of different types of networks for "seeding" my GP populations (Hopfield, Boltzmann, Convolutional etc), and connect those using different setups and mutations, I am reusing existing networks for those purposes - but I obviously need to "integrate" those with the existing code, e.g. by matching APIs, that is something where IP can actually help, because I can in theory specify a valid solution, without having to provide it myself.

I realize that this use-case may be a bit obscure given your descriptions above ;-)

So far, I have been using a vector of unit tests per NN as the fitness function to judge how well the GP-evolved network "works". Once I better understand inductive.js, that would help tremendously.

EDIT: To provide a little more context: One of the most difficult things in GP is coming up with a suitable fitness function to judge the fitness of an evolved algorithm/program - inductive programming would allow you to specify the desired outputs of the fitness function, without actually having to write the fitness function itself

https://en.wikipedia.org/wiki/Fitness_function

UniqueFool commented 8 years ago

I am not claiming that I understand how you set up the solver, but I was wondering if it'd be possible to treat each specify() block as the foundation for task based parallelism - i.e. so that specifications could be solved in parallel that way. What do you think, how much work, and what kind of work, would it be to teach inductive to solve specifications in a parallel (or even just functional-programming style ) ?

omphalos commented 8 years ago

Sorry for my late replies.

I've done a bit of machine learning in the past, although I'm no expert on the subject. The project you're working on sounds very interesting, although I'd like to understand what you're trying to do better.

It sounds like you are using a genetic algorithm to evolve neural networks. I don't understand how this is genetic programming since the output seems to be neural network topologies not computer programs - I may be missing something.

Additionally it sounds like you want to use inductive programming to specify a fitness function. In your case, it sounds like a fitness function is composed of a desired set of outputs, and way of scoring candidates against the desired outputs. Typically members of generations are allowed to reproduce depending on how they match up with the fitness function. It sounds like you want inductive.js to come up with a fitness function for this. Do I understand you correctly?

I was wondering if it'd be possible to treat each specify() block as the foundation for task based parallelism - i.e. so that specifications could be solved in parallel that way.

It's possible to do this. Probably without too much work. If the evaluation of each program takes a very long time, it's possible, without much work to do this in a parallel way as well. Your problem sounds like a very interesting application of inductive.js (although I don't understand the details) - and it's something I hadn't thought when I wrote it, which makes it even cooler in my eyes :)

My main worry is that inductive.js isn't as powerful as you might need it to be, in which case, even if I make some changes to the API to let you do what you want, it might not help you. I don't know if you could give me an idea of what a specification you might want to parallelize might look like. Maybe that can help move this conversation forward. If I can do something to help you out, I'll definitely consider making a change to the library.

UniqueFool commented 8 years ago

Thanks for getting back in touch. Yeah, I admit my responses may have been a little imprecise. I am not just involving NN topologies (which a GA would do), but GPs in the form of ASTs (syntax trees) that are composed of building blocks from different machine learning projects, including very different APIs.

Basically, imagine it like taking 3 different NN libraries and using them together - I treat their main APIs (extracted via unit tests and examples/demos) and use them for seeding ASTs that are then evolved via GP until they match a certain fitness function, i.e. the solution can be specified but the approach benefits greatly from GP/IP.

I admit it's an unusual use-case, so I fully understand if it's outside the scope of what you had/have in mind for this project - and I'll be glad to even just receive a few more pointers to research the necessary API changes myself. I am glad you think that parallelizing the breadth-first search at the specification level should work - I was thinking the same thing when I realized that your code looks very LISP-ish/functional to me, which is a good when researching parallelization opportunities.

I will check if it's possible to easily get a dependency chain of specification blocks so that parallel solving can be prioritized by looking at nested specs/dependencies.

Besides, I am not sure if I can come up with a self-contained example that doesn't require tons of installation, setup and configuration.

Regarding your concern about inductive.js not being powerful enough, I have actually looked at how to run the solvery with varying specs, basically in a loop that traverses an array in a forEach loop and which adds to the pool of APIs as long as a certain goal cannot be reached, and for that I would need to inspect some kind of output hash/object from the solver.

Your point concerning the use of inductive.js to let it evolve a fitness function is spot-on: it's something where a "good" solution can be specified but the actual function/approach is non-trivial to create manually - so it is much more straightforward to use inductive.js, provide a bunch of specifications, and custom building blocks, and let it try to re-arrange those in order to obtain the result.

Again, thanks for all your feedback, it's very much appreciated - especially in the light of this being certainly an obscure use-case ...

omphalos commented 8 years ago

Thanks for the added background.

So I'm not trying so much to get a working example I might play with so much as seeing a mock-up of how you see the interface working.

Um, so are you thinking something along these lines?

specify('someSpec',
  given(/* some arguments */,
    shouldSatisfy(function(x, callback) {
      doSomethingInParallel(function onParallelOperationComplete(testPassed) {
        callback(testPassed)
      })
    })),
  use(myBuildingBlock1, myBuildingBlock2))

If not, what would you change?

UniqueFool commented 8 years ago

yikes, unless I am missing something, that's pretty different from what I had in mind ... but ... still interesting :-)

Actually, the reason for my original question was running the BFS search itself in parallel, at least for sub-specs that don't have any other dependencies. Thus, I wasn't thinking of using inductive.js itself to evolve parallel programs - even though I like the "meta" nature of that ;-)

Frankly, I didn't even think about it yet, i.e. that there may be parallel APIs that need to run.

omphalos commented 8 years ago

Haha, ok. Well I guess we can start with the pseudocode you wrote in the parallelization issue.

UniqueFool commented 8 years ago

yeah, my use-case seems much simpler than what you are having in mind, which is probably not such a bad thing ;-)