As far as I can tell, the only place where the edgelist is actually needed is when making a random draw from it. But it should be possible to make random draws from the key array by:
Select a random index from 0 to table capacity.
If that cell is occupied, return the corresponding key.
If unoccupied, either repeat from 1, or probe around the table in some quasi-random way until an occupied cell is found.
This will save quite a bit of memory: you don't need a value array, and you don't need the unsorted edgelist. It will also save the costs of updating the edgelist.
On the other hand, the expected number of draws/probes required is 1/(load factor), where load factor is the proportion of table's cells that is occupied. This means that if the table is too sparsely populated---which can happen if the network starts out dense and then becomes sparse---it can become very inefficient. khash does support shrinking tables, but automatic shrinking is not implemented at this time.
As far as I can tell, the only place where the edgelist is actually needed is when making a random draw from it. But it should be possible to make random draws from the key array by:
This will save quite a bit of memory: you don't need a value array, and you don't need the unsorted edgelist. It will also save the costs of updating the edgelist.
On the other hand, the expected number of draws/probes required is 1/(load factor), where load factor is the proportion of table's cells that is occupied. This means that if the table is too sparsely populated---which can happen if the network starts out dense and then becomes sparse---it can become very inefficient.
khash
does support shrinking tables, but automatic shrinking is not implemented at this time.@chad-klumb , what do you think?