Parallel Algorithm - Githubissues

pwightman commented 12 years ago

I think the problem with the parallel algorithm is that we're partitioning the data such that each thread has its own letter in the string, but it shouldn't just start with n threads I don't think, it should only branch off when it hits an epsilon jump, correct? So having each thread start on a different letter of the string and start running from the beginning seems like it would not give correct results. I don't know that it explains the segfault, but we might need to change some things about it. I haven't quite thought of a good solution, but I just wanted to get some conversation about it going here.

pwightman commented 12 years ago

It might help to create a traversal class that takes in an NFA, a state, and the remaining string it has left to traverse

runNfa( startString )
{
  Traversal* t = new Traversal(nfa, startString, nfa->startState)
  traverse(t)
}

traverse( currentTraversal )
{
  loop through currentTraversal->stringLeft     // just as we're already doing
    t.step()              // or something like this so it moves one state
    ...other stuff...

    if( you find epsilon jump)
      Traversal* newTraversal = new Traversal(nfa, currentTraversal->stringLeft, currentTraversal->currentState)
      startThread(traverse(newTraversal))

    ...other stuff...
}

How does that look?

pwightman commented 12 years ago

Also, further from this it might be wise to actually pass in an array of traversals instead of a single one, because it could happen that there's too many threads already started and we don't want to spin off any more, so instead of branching we add it to the array and put another loop around the current loop that loops through the array (which may just have one traversal in it)

pwightman commented 12 years ago

So I'm coding this up in a branch, and I'm running into a couple problems. The main one being that currently, the sequential version returns all the states that it was in when it ended, but if we spin off threads, we need some way of consolidating all the traversal paths at the end. We can do this by having a finalStates variables that we lock/unlock so when a thread finishes its traversal, it sends it to the finalStates variable so we have them all.

Another option is that maybe we don't care about all the ending states, we only care about whether it passes or not. In which case we could have some way of signaling the main thread when one of the worker threads reaches a final state, and kill off all the ones still working. I know how to do this with Pthreads, but I'm not sure how in OpenMP.

pwightman / nfa_evaluator

Parallel Algorithm #5