yumetodo / SigContrastFastAviUtl

Sigmoidal/Logit contrast Aviutl plugin. IM is not used.
MIT License
2 stars 0 forks source link

change to use std::thread #9

Closed yumetodo closed 7 years ago

yumetodo commented 8 years ago

use std::thread.

benchmark

960*720
SDeContrast
Midtone : 50
Strength: 6
max:最大値
min:最小値
sum:合計
avg:平均
stdev:母集団標準偏差
se:標準誤差
95%CL:95%信頼区間
C.I.max:95%信頼区間を平均に加えたもの
C.I.min:95%信頼区間から平均を引いたもの
before[ns] after[ns]
max 70144353 90190587
min 25787378 22779376
avg. 38799114.8973277 36900188.2022989
count 2133 2175
stdev 4120405.38657474 8409477.70072626
se 89216.3637673064 180318.129087751
95%CL 174860.859815545 353417.038771636
C.I.max 38973975.7571433 37253605.2410705
C.I.min 38624254.0375122 36546771.1635272
MaverickTse commented 8 years ago

Sorry, nope. The change is too big for such marginal improvement. If there are 10ms difference on 720P, then I'll consider.

MaverickTse commented 8 years ago

Also you need to turn off the /Qpar option in the project file, otherwise the results cannot be compared.

MaverickTse commented 8 years ago

show me the new benchmark plz

yumetodo commented 8 years ago

@MaverickTse prease wait. I'm working #10 to investigate this.

MaverickTse commented 8 years ago

I'd suggest against working on this further more. I have checked on my computer that /Qpar works correctly(using 12 out of my 13 cpu cores) and now I use a simple float[4] array on stack which posed no problem on row-wise parallelization. The use of std::thread really makes the code a lot more complex and I have no confident I can maintain that.

yumetodo commented 8 years ago
before[ns] after[ns]
max 58851592 122142375
min 15095044 17856938
avg. 40697668.4706525 39822508.8662606
count 3663 3776
stdev 5228744.19346652 8064316.19743504
se 86393.0757542369 131235.567943636
95%CL 169327.316991945 257216.986660185
C.L.max 40866995.7876444 40079725.8529208
C.L.min 40528341.1536605 39565291.8796004

very small speed-up...

MaverickTse commented 8 years ago

The average is only 1ms difference while stdev is a LOT wider... so nope. I have also tried the Concurrency library available in VS2015 (ppl.h), also to no avail. The /Qpar is still the most stable and fastest.

If you want to continue to play with the multi-threading issue, try to parallelize the LUT lookup. So far I failed to do this and time of processing increase with number of channels being processed.

MaverickTse commented 8 years ago

Note that I'm not going to merge unless there are enough improvement in speed.

yumetodo commented 7 years ago

やっとテーブルの動的メモリー確保を削れた。 テストコードより、もとの計算結果と100%一致することを確認。

これでやっとAvUtlでの実地試験に望める。

yumetodo commented 7 years ago

Ut Video Codec SuiteなAVI素材だとなんかバグっているけど image とりあえず1000 micro sec.~3000 micro sec.にまで高速化できたので成果はあった。

MaverickTse commented 7 years ago

このバッグを治ったらmerge する

yumetodo commented 7 years ago

とりあえず範囲外についても元と同じ挙動にした。これよりAviUtl上でのテストに入る

追記:commit messageミスってたのでforce pushしました。 https://github.com/MaverickTse/SigContrastFastAviUtl/pull/9/commits/4b5b23c2d77e45640e2e91fe50586b6ad2853c88 が消えてるのはそのためです

yumetodo commented 7 years ago

治ったっぽい・・? image