keif888 / SSISMHash

SSIS Multiple Hash makes it possible to generate many Hash values from each input row. Hash's supported include MD5 and SHA1.
Microsoft Reciprocal License
34 stars 6 forks source link

Possibility for a CustomProperty to specify numOfThreads #12

Open matsremman opened 6 years ago

matsremman commented 6 years ago

Hi, i would like to be able to specify a number of threads, instead of only having 1, Auto or On (when using your component programatically). It'd be nice to be able to set Auto-threading with a lower threshold than the number of processing cores, considering i might have 4 packages going with 8 hash components simultaneously - on a 32 thread system, i might prefer to say 4 threads max per instance of the component.

Thanks

keif888 commented 6 years ago

It is possible, and involves changes to around 6 functions, creation of a couple, and form modifications. And then the regression testing to ensure nothing is broken.

I understand the use case of attempting to control the amount of CPU that the hashing will consume.

FYI. Currently Auto will use cores - 1, IF there are more than 5 output columns in the component. If there are less than 5 output columns, then the overhead of multiple threads exceeds the gain's that it provides.

matsremman commented 6 years ago

Wait, so - output columns, not the number of input columns to hash.. I've misunderstood this then - if so then a single thread is just fine, as I'm just using 1 column per component anyways.

keif888 commented 6 years ago

If you have a wide input (100 odd columns for example), and are generating 15 hash output columns (for example) from that, then On and Auto will improve the hashing performance. This is because 15 byte arrays are being created, and then hashed, for each input row. That can be done in parallel, to improve performance (a lot if you have enough cores).

If you have all of those columns into a single hash, then Auto will choose single thread, as it has to put all the columns into a single byte array and then hash that result.

The component has to work row by row, as it's a sync non blocking component.

It sound's like you don't need the enhancement. And that I might need to improve the documentation on how Auto and On help performance...