Open oyvindln opened 2 years ago
--used_saved_levels
helps notably. Needs some work still for samples with recording cuts (e.g causes issues on crafters after recording cut).
--no_resample
makes <40 sample rate stuff notably faster. Seems to cause at least a minor differences on hsync location on crafters sample so needs checking too.
Concatenate in LDDecodeCache::read takes a notable amount of time so need to see if that function can be improved in some way.
Some things noted in upstream issue https://github.com/happycube/ld-decode/issues/802 (vhs-decode is already using rfft so not all of it applies, needs some fixes on cvbs-decode still)
For the tbc/single thread parts, the level detection code in particular seems to take up a lot of processing time, especially the
filtfilt
andargrelextrema
calls. Maybe it could be possible to combine some of the filters and to e.g only look near the expected vsync area if the previous field was good? Are there some ways we could do the filtering that is currently done both backwards and forwards and combining each half of the outputs in one with the filter padding options or similar?The sharpness EQ and chroma trap functions also slow down things a bit when enabled, haven't looked much into those as of now.
in demodblock there are some filters like the chroma ones that I don't know if would be more efficient to do in frequency domain since we already have a fft in some cases or not, provided it won't cause phase issues and we can make them work the same way as with using filtfilt (or alternatively look into FIR filters as they shouldn't be any slower when in the freq domain). Also need to see if we can use single precision for the chroma signal without causing issues and if that has any speed benefit.
Otherwise, there are probably areas that could benefit a bit from cython optimization to varying degrees.