snazzy-d / sdc

The Snazzy D Compiler
MIT License
250 stars 55 forks source link

Reduce reallocs of the worklist #396

Closed schveiguy closed 1 month ago

schveiguy commented 1 month ago

For a large range (like 100MB), instead of allocating an array of 1500+ range elements, store one range with the whole thing, and split off pieces of it for each thread that wants a piece.

deadalnix commented 1 month ago

Does that make any difference? It's more complex and it seems to me that it puts more work under the lock, which is where the contention is to begin with.

schveiguy commented 1 month ago

Does that make any difference?

It prevents large allocations under the lock for large scannable segments

it seems to me that it puts more work under the lock

The previous code locked every time it was pushing segments, so there is less work done on the first part, but more done on the pop.

For a 100MB range, for instance, this would take the lock 96 times, along with a check for expansion and running a loop over 16 elements, vs. 1 lock + one check.

I can reduce the work though, by storing the calculation of how many "work items" are in the next range.

schveiguy commented 1 month ago

OK, this might be the best I can get this. If not merged, at least it gives some ideas of how to efficiently split up the work.

schveiguy commented 1 month ago

No longer relevant.