johnnyzhangzhao / monav

Automatically exported from code.google.com/p/monav
0 stars 0 forks source link

Preprocessor: Fix unpredicatable result of BinaryHeap::WasInserted() #98

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
Hi,

we had an uninitialized memory issue in the BinaryHeap.

From the patch:
------------------------------------------------------
The problem was that the BinaryHeap::nodeIndex
was not initialized at all. This will give
(very slightly) unpredictable contraction results.

We can't initialize with zero as it's a valid node id.

valgrind reported:
==11698== Conditional jump or move depends on uninitialised value(s)
==11698==    at 0x44D61D: BinaryHeap<unsigned int, unsigned int, unsigned int, 
Contractor::_HeapData, ArrayStorage<unsigned int, unsigned int> 
>::WasInserted(unsigned int) (binaryheap.h:138)
==11698==    by 0x44DA60: bool 
Contractor::_Contract<true>(Contractor::_ThreadData*, unsigned int, 
Contractor::_ContractionInformation*) (contractor.h:508)
==11698==    by 0x447029: Contractor::_Evaluate(Contractor::_ThreadData*, 
Contractor::_PriorityData*, unsigned int) (contractor.h:464)
==11698==    by 0x43EB57: Contractor::Run() [clone ._omp_fn.1] 
(contractor.h:262)
==11698==    by 0x446383: Contractor::Run() (contractor.h:257)
==11698==    by 0x43D1EF: ContractionHierarchies::Preprocess(IImporter*, 
QString) (contractionhierarchies.cpp:88)
==11698==    by 0x417FBB: runRouting(QString, IImporter*, IPreprocessor*, 
IPreprocessor*, PackerInfo) (pluginmanager.cpp:424)
==11698==    by 0x4193F8: PluginManager::processRoutingModule(QString, QString, 
QString, QString, bool) (pluginmanager.cpp:550)
==11698==    by 0x422AB3: main (console-main.cpp:295)
==11698== 
==11698== Use of uninitialised value of size 8
==11698==    at 0x44D638: BinaryHeap<unsigned int, unsigned int, unsigned int, 
Contractor::_HeapData, ArrayStorage<unsigned int, unsigned int> 
>::WasInserted(unsigned int) (binaryheap.h:140)
==11698==    by 0x44DA60: bool 
Contractor::_Contract<true>(Contractor::_ThreadData*, unsigned int, 
Contractor::_ContractionInformation*) (contractor.h:508)
==11698==    by 0x447029: Contractor::_Evaluate(Contractor::_ThreadData*, 
Contractor::_PriorityData*, unsigned int) (contractor.h:464)
==11698==    by 0x43EB57: Contractor::Run() [clone ._omp_fn.1] 
(contractor.h:262)
==11698==    by 0x446383: Contractor::Run() (contractor.h:257)
==11698==    by 0x43D1EF: ContractionHierarchies::Preprocess(IImporter*, 
QString) (contractionhierarchies.cpp:88)
==11698==    by 0x417FBB: runRouting(QString, IImporter*, IPreprocessor*, 
IPreprocessor*, PackerInfo) (pluginmanager.cpp:424)
==11698==    by 0x4193F8: PluginManager::processRoutingModule(QString, QString, 
QString, QString, bool) (pluginmanager.cpp:550)
==11698==    by 0x422AB3: main (console-main.cpp:295

Original issue reported on code.google.com by t...@simonv.com on 2 Jun 2013 at 12:10

Attachments:

GoogleCodeExporter commented 8 years ago
Uh, the preprocessor code is actually relying on the fact that the nodes are 
never properly reset. It acts as some kind of "cache". Otherwise the 
"Initialise Elimination PQ..." phase slows down to a crawl.

Attached is an updated patch that only initializes the memory in the 
constructor (=once) for now. This side-effect of the commented out clear() 
function needs fixing ;)

Thomas

Original comment by t...@simonv.com on 2 Jun 2013 at 7:46

Attachments:

GoogleCodeExporter commented 8 years ago
The binary heap uses a double referenced array for storage, which handles 
uninitialized memory properly: A nodeIndex is always checked whether it is 
valid (i.e., pointing a a node of the correct id), and for that reason you 
never need to reset it. As you have noticed this gives a huge performance gain 
when clearing/initializing the mapping.

I don't think that initializing in the constructor is necessary at all. I know 
that valgrind is a bit picky on this, but is should be guaranteed that no 
undefined behavior arises from this jump.

Instead, the non-deterministic contraction behavior comes from the fact, that 
edges in the graph might have a different order with each execution. The graph 
class may insert them in the end, or the beginning of an edge block.

This can be solved by either handling equally long paths differently in witness 
search, or by sorting edge buckets after each round of contraction. I opted not 
to do this since it slows down contraction a bit (5-10%). However I can see 
that reproducibility might be worth this performance loss.

Original comment by veaac.fd...@gmail.com on 3 Jun 2013 at 2:34

GoogleCodeExporter commented 8 years ago
Hi,

> I don't think that initializing in the constructor is necessary at all.
> I know that valgrind is a bit picky on this, but is should be guaranteed
> that no undefined behavior arises from this jump.

IIRC I added debug output to WasInserted() to print the node id + result of the 
index lookup. If insertedNodes[index].node contains the same number as the 
input node id by accident, it will produce the "wrong" result. This easily 
happens for the node id zero.

Probably this is a non-issue as node number zero is the root node in most 
cases, still we are accessing uninitialized memory -> the memset won't hurt and 
also won't cost noticeable performance :)

Original comment by t...@simonv.com on 3 Jun 2013 at 2:49

GoogleCodeExporter commented 8 years ago
bool WasInserted( NodeID node ) {
    const Key index = nodeIndex[node];
    if ( index >= ( Key ) insertedNodes.size() )
        return false;
    return insertedNodes[index].node == node;
}

insertedNodes should always be initialized, since only Insert() will add one. 
nodeIndex[node] might not be initialized. If it is not pointing to valid data 
(<size) we know it is invalid and not in the map. If it is actually pointing to 
valid data, we check whether the valid data is the node we expect there. This 
can only happen if it was inserted already. If there is a different node we 
know that we were reading random data and the node is not in the map.

That means that the heap (hopefully *g*) should never be mistaken about whether 
a node is in there or not, and never expose initialized data to the outside, 
handling it properly.

Original comment by veaac.fd...@gmail.com on 3 Jun 2013 at 2:58