statnet / ergm

Fit, Simulate and Diagnose Exponential-Family Models for Networks
Other
94 stars 36 forks source link

HashEL doesn't need UnsrtEL backend? #549

Open krivit opened 6 months ago

krivit commented 6 months ago

As far as I can tell, the only place where the edgelist is actually needed is when making a random draw from it. But it should be possible to make random draws from the key array by:

  1. Select a random index from 0 to table capacity.
  2. If that cell is occupied, return the corresponding key.
  3. If unoccupied, either repeat from 1, or probe around the table in some quasi-random way until an occupied cell is found.

This will save quite a bit of memory: you don't need a value array, and you don't need the unsorted edgelist. It will also save the costs of updating the edgelist.

On the other hand, the expected number of draws/probes required is 1/(load factor), where load factor is the proportion of table's cells that is occupied. This means that if the table is too sparsely populated---which can happen if the network starts out dense and then becomes sparse---it can become very inefficient. khash does support shrinking tables, but automatic shrinking is not implemented at this time.

@chad-klumb , what do you think?