Custom interface for rendering subtitles on an RGBA texture(s) | (XySubFilter for madVR) [Part 1]

xiaowan3 / xy-vsfilter

Automatically exported from code.google.com/p/xy-vsfilter

0 stars 0 forks source link

Custom interface for rendering subtitles on an RGBA texture(s) | (XySubFilter for madVR) [Part 1] #40

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago

This is a kind of feature request. Fot the info you better see the link beliw, 
since I don't feel like past all conversation here.
http://forum.doom9.org/showthread.php?p=1535165#post1535165

Original issue reported on code.google.com by yakits...@gmail.com on 30 Oct 2011 at 6:08

Merged into: #91

GoogleCodeExporter commented 9 years ago

YuZhuoHuang wrote:

> In interface ISubRenderOptions:
> I want video file name to search for corresponding
> external subtitles. Should I get it from
> ISubRenderOptions?

DirectShow does not inform the video renderer about the video file name. I 
think most renderers won't know the video file name, so asking them won't be 
much use. I think xy-vsfilter should enumerate the DirectShow filters in the 
graph and ask each for IFileSourceFilter. That's your best bet to get the video 
file name.

> I'm not sure if random access is necessary,
> but I can go with it.

Random access wasn't really my goal. I just wanted to make the interface as 
clean as possible, which as a side effect allows random access. But is it 
really a problem at all? If I ask you to render a subtitle frame, you'll 
probably create an array of RGBA bitmaps and store that array somewhere. So the 
random access would just fetch different indexes from the already existing 
array. So it should be no problem, should it?

> The ID can stay identical if only the *placement*
> changes?

The key is whether the bitmap content changes. If the bitmap content changes, 
the bitmap should get a new ID. If the content does not change then why would 
the width/height of the bitmap change? That makes no sense to me. So if the 
placement width/height changes, a new ID should be used. Only the position of 
the bitmap should be allowed to change while reusing the same ID.

Is there a situation where the width/height of the bitmap can change without 
the content? I don't see how right now.

Maybe it would be clearer if we split "RECT placement" into "SIZE position" and 
"SIZE size"?

> And the callee should guarantee not to modify pixels,
> if using
>   LPCVOID *pixels
> instead of
>   LPVOID *pixels
> makes sense?

That's just fine with me.

> I'd prefer a SetMaxBitmapCountPerFrame
> in ISubRenderServices.
> Knowing the setting before RenderFrame calls
> may help me decide what to cache or prebuffer.

I've added a new option called "maxNumBitmaps" and removed the 
"GetCombinedBitmap" method. Hope that's ok with you?

Hendrik wrote:

> > who should initiate the connection

> I vote the subtitle renderer. Its easier for
> the sub renderer to support multiple interfaces
> (ie. madVRs new interface, EVRs old interface,
> or falling back to drawing onto the plain image)
> if it doesn't have to "wait" if a renderer offers
> an interface. Instead i can be sure if there is
> an interface, or not.

I understand. But in which point in time would you go looking for the video 
renderer's interface? The problem is that you can't be sure even in 
CompleteConnect that the video renderer was already added to the graph. Some 
media players might add filter by filter, connecting them right away, and only 
add the next filter to the graph once the others are connected. So if the video 
renderer is not in the graph yet when the subtitle renderer's input pin runs 
through CompleteConnect, how/when will you try to connect to the video 
renderer? Because of this problem I thought it was better to let the video 
renderer do the connection. Or do you think it's not a problem? How would you 
solve the problem?

@YuZhuoHuang: You voted for letting the video renderer establish the 
connection. Is there a specific reason for your vote? Or would it be ok the 
other way round, too?

(I've not changed this yet, needs a bit more discussion.)

> What really bugs me about the interface is the
> way memory management is done for the options.
> Handing out memory pointers with a rule that
> they should stay valid is rather obscure, imho.
> Instead, i would follow the MS interfaces, and
> just let the callee allocate the data using a
> pre-determined function (ie. CoTaskMemAlloc),
> and make the caller responsible for freeing
> with the matching function (ie. CoTaskMemFree).
> At least for the options i would prefer it this way.

That's fine with me. I thought that most of these options would use string 
constants, anyway, so I thought it would be simpler to do it the way I 
suggested. But I've changed the options to use LocalAlloc now. Is that ok with 
you? I can also use CoTaskMemAlloc, if you prefer that.

> For the ISubRenderFrame, i guess its ok'ish to
> hand out fixed pointers, because its a object
> which actually holds the subtitle data, and
> those functions are just "getters" to expose
> the internal data

Yeah, my main goal here was to avoid having to do an additional mem copy. If we 
used allocation for the pixels, we'd need one extra allocation/free and one 
extra memcopy for each bitmap.

> I would however adjust the comment, and instead
> say something like this:
> // The memory pointed to by the "pixels" variable
> is only valid until the next call of GetBitmap,
> GetCombinedBitmap, or Release

Done, slightly changed. Is it ok that way?

> Otherwise, i guess its ok

Any more things we could change to make it better than just "ok"?  :-)  I'm 
open for any changes or even completely different approaches.

Here's the updated header:

http://madshi.net/SubRenderIntf.h

Original comment by mad...@gmail.com on 27 Nov 2011 at 8:26

GoogleCodeExporter commented 9 years ago

> I understand. But in which point in time would you go looking for the video 
renderer's interface?

When the Filter goes through the Stopped -> Paused transition, just before 
playback starts, i suppose. You can be sure the graph is finished then, but no 
data has flown yet.

I'll ponder on the overall design a bit later today.

Original comment by h.lepp...@gmail.com on 27 Nov 2011 at 8:31

GoogleCodeExporter commented 9 years ago

> Random access wasn't really my goal. I just wanted to make
> the interface as clean as possible, which as a side effect
> allows random access. But is it really a problem at all? 
> If I ask you to render a subtitle frame, you'll probably 
> create an array of RGBA bitmaps and store that array somewhere.
> So the random access would just fetch different indexes 
> from the already existing array. So it should be no problem,
> should it?

Libass outputs a link list and I am using a link list internally in xy-vsfilter 
too. But converting that list to an array is not a big deal. And I like a 
cleaner interface too. Anyway I can live with either random access or sequence 
access. 

> Is there a situation where the width/height of the bitmap 
> can change without the content? I don't see how right now.

When I asked the question I was thinking that if there's a long line of text 
moving into the the video from outside, or moving out from inside, the bitmap 
content keeps unchanged in such cases, but the size of the content to be 
displayed is changing. But I just realized that width/height could be kept 
unchanged, if "RECT placement" was allowed to run out of the video rect. 

> Maybe it would be clearer if we split "RECT placement" 
> into "SIZE position" and "SIZE size"?

Yes. And "POINT position" and "SIZE size" seems a little little little bit 
better. And some extra comment like:
"POINT position may be negative. The contain of the bitmap may go out of the 
video rect, consumer should do an extra clip"

> I've added a new option called "maxNumBitmaps" and removed
> the "GetCombinedBitmap" method. Hope that's ok with you?

Ok.

> @YuZhuoHuang: You voted for letting the video renderer 
> establish the connection. Is there a specific reason for 
> your vote? Or would it be ok the other way round, too?

For I don't know when to connect to the video render. Not a dshow expert like 
Nev. It seems a bit complicate for me. And "falling back to drawing onto the 
plain image" is impossible for a subtitle renderer without video input/output 
pins, is it?

Indeed I still have a question on:
    "Instead madVR could just load and add xy-vsfilter to the graph manually".
I've always believed that for a filter to be added to the graph, there must be 
a pin in the graph that the filter can connect to. Now given that the subtitle 
render has subtitle input pins only, and if there's no subtitle output in on 
the splitter, how can it *add* to the graph after the video render loading it? 
Or I was wrong. There can be such a filter in the graph that not connected to 
any other filters?

Original comment by YuZhuoHu...@gmail.com on 27 Nov 2011 at 1:03

GoogleCodeExporter commented 9 years ago

> So if the subtitle render works in a framerate (very) different
> from actual playback, animated effects, e.g. moving/rotation/fading
> in/fading out, won't be smooth. 

I firmly believe the issue above shouldn't be ignored, and that VFR with 
pre-buffering should be explicitly supported within the new interface, even if 
it was made an optional feature. Something along the lines of adding a way for 
VFR with pre-buffering support to be negotiated via SubRenderIntf with related 
variables to define support if supported.

Since madVR and even EVR-CP has a decoder pre-buffer, how difficult would it be 
to instantly report any framerate changes for X future frames, where X is the 
renderer's decoder pre-buffer? Would something like this make most sense to be 
handled by the splitter instead? If say madVR or a splitter buffered/parsed the 
next 24 decoded/source video frames, it would give the subtitle filter a 
guarantee of the video framerate within the next 24 frames. If a change in 
frame rate occurs (VFR), the frame timestamp the change happens along with new 
frame-rate would be reported.

Actually, thinking about this another way, how about having just having the 
subtitle filter render subtitles based on a multiple of the monitor refresh 
rate which is greater than or equal to the video frame rate? Some care would 
need to be taken to ensure frame accurate timing (scene-timing) isn't broken, 
but that would be one way to resolve any VFR with pre-buffering concerns. In 
other words, the subtitle filter would potentially render at a frame-rate 
higher than than video frame-rate. The renderer would instead show subtitles 
based on VSYNC rather than video framerate, in order to offer optimal 
smoothness, even in the face of VFR. Make sense?

Original comment by cyber.sp...@gmail.com on 27 Nov 2011 at 2:18

GoogleCodeExporter commented 9 years ago

Since madVR is already buffering so many frames, the question is if the 
subrenderer really needs to pre-buffer, if the communication with the subtitle 
renderer is designed properly.

If madVR requests the subtitle as soon as it gets the frame from the decoder, 
but then gives the sub renderer time until the last step when it needs to blit 
them, the subtitle renderer has quiet alot of time to render the subtitles, and 
can internally run everything multi-threaded, and use the time madVR gives us 
due to its queues.

Now, i don't know how EVR-CP handles queueing, i assume it doesn't have quite 
such extensive queues - but that also means you wouldn't be able to trust its 
fps information for pre-buffering either.

The option to base it on the screen refresh is pretty good, however it would 
require quite a bit more CPU to render them at eg. 60 fps instead of 24/30.
Luckily, this feature does not require interface support other then the 
renderer telling us the refresh rate - which it already does.

If i were to implement this now, i would probably go with frame-rate based 
pre-buffering by default, with an option/fallback to screen refresh based 
pre-buffering.

Original comment by h.lepp...@gmail.com on 28 Nov 2011 at 9:16

GoogleCodeExporter commented 9 years ago

Hendrik wrote:

> When the Filter goes through the Stopped -> Paused
> transition, just before playback starts, i suppose.

Yeah, I guess that should work.

> I'll ponder on the overall design a bit later today.

So?  :-)

YuZhuoHuang wrote:

> But I just realized that width/height could
> be kept unchanged, if "RECT placement" was
> allowed to run out of the video rect. 
> And "POINT position" and "SIZE size" seems
> a little little little bit better. And some
> extra comment like:
> "POINT position may be negative. The contain
> of the bitmap may go out of the video rect,
> consumer should do an extra clip"

Yeah, makes sense, I've applied the suggested changes.

> For I don't know when to connect to the video
> render. Not a dshow expert like Nev. It seems
> a bit complicate for me. And "falling back to
> drawing onto the plain image" is impossible
> for a subtitle renderer without video input/
> output pins, is it?

I'm not sure about that, either. Of course it would be possible to dynamically 
add an output pin, if the subtitle renderer finds no renderer supporting the 
new interface. But at that point in time all the connections are already made, 
and the subtitle renderer would also need a new input pin for the raw video 
stream and it would need the raw video to run through this new input pin. So 
all connections would have to be remade etc. I don't really think this would be 
a good solution.

I think it would make more sense to offer 2 different versions of the subtitle 
renderer. One which is based on the new interface (maybe also supporting the 
old "ISubRender" interface used by MPC-HC and PotPlayer's internal subtitle 
renderer). And another one which requires the raw video stream to run through 
the subtitle renderer's input/output. I'm not sure if it is easily possible to 
combine both into just one filter.

@Hendrik, your opinion on that?

> I've always believed that for a filter to be
> added to the graph, there must be a pin in
> the graph that the filter can connect to.
> Now given that the subtitle render has subtitle
> input pins only, and if there's no subtitle
> output in on the splitter, how can it *add*
> to the graph after the video render loading
> it? Or I was wrong. There can be such a filter
> in the graph that not connected to any other
> filters?

You can try it with GraphEdit. It's no problem to add a number of filters to 
the graph which aren't connected at all. The one thing I don't know is if all 
those "lonely" filters are getting graph events like "Start/Stop/Pause" etc, 
which would be necessary for our planned approach to work, but I rather think 
it should work. If all else fails, we can add a dummy subtitle renderer output 
pin and a dummy video renderer input pin with a private FOURCC, which would 
have no other purpose than to make sure that the subtitle renderer has 
something to pin connect to. But I don't think it will be necessary.

cyberbeing wrote:

> I firmly believe the issue above shouldn't be
> ignored, and that VFR with pre-buffering should
> be explicitly supported within the new interface,
> even if it was made an optional feature. Something
> along the lines of adding a way for VFR with
> pre-buffering support to be negotiated via
> SubRenderIntf with related variables to define
> support if supported.

Fully agree with Hendrik here.

You're expecting the video renderer to provide the subtitle renderer with 
information the video renderer itself does not have. The video renderer can 
only guess future frame start/stop times. And it can't guess any better than 
the subtitle renderer could. The only way to stop the guessing is to use a 
really big decoder queue and to render the subtitles for the whole decoder 
queue in advance. This would practically replace explicit pre-buffering in the 
subtitle renderer. And this would automatically happen with madVR, if you set 
the CPU/decoder queue to high values. But this probably wouldn't work for other 
renderers because they (to my best knowledge) don't have a big decoder queue. 
You can't expect these other renderers to help with VFR content, because they 
don't know any better than the subtitle renderer knows on its own.

You could require the splitter to somehow parse many frames in advance and 
pre-report the future frame start/stop times to the subtitle renderer, but this 
would require another private interface between splitter and subtitle renderer, 
and I'm not sure if that's really worth it, especially because we would 
probably have a hard time convincing splitter developers to add support for 
that, just for improving subtitle pre-buffering support for some renderers. 
Maybe Hendrik would be willing to do this for the LAV Splitter, don't know. 
Don't know how much work it would be, either. And again, I don't know if it 
would be worth it. Hendrik, your opinion on that?

> Actually, thinking about this another way, how
> about having just having the subtitle filter
> render subtitles based on a multiple of the
> monitor refresh rate which is greater than or
> equal to the video frame rate?

Possible, but you do know that some people are running at 120Hz? The CPU power 
required for rendering fading/moving/rotating subtitles at 120Hz instead of 
just 24Hz would be quite big, I would expect.

> The renderer would instead show subtitles
> based on VSYNC rather than video framerate,
> in order to offer optimal smoothness, even
> in the face of VFR. Make sense?

Yes, but this would also require the video renderer to render at display 
refresh rate, which costs more CPU and GPU power than rendering at only the 
movie frame rate.

Hendrik wrote:

> If madVR requests the subtitle as soon as
> it gets the frame from the decoder

That will be the case.

> If i were to implement this now, i would
> probably go with frame-rate based pre-buffering
> by default, with an option/fallback to screen
> refresh based pre-buffering.

Another option would be for the splitter to analyze the past start/stop times 
and to do an educated guess on future start/stop times. This should work for 
99% of the content out there. For VFR content it would fail every time the 
frame rate changes. If it fails, the pre-buffered subtitle frames could be 
discarded and re-rendered to avoid any visual problems. The cost would be 
higher CPU consumption (and lack of valid pre-buffering) every time the frame 
rate changes unexpectedly in VFR content.

Original comment by mad...@gmail.com on 28 Nov 2011 at 9:58

GoogleCodeExporter commented 9 years ago

> The option to base it on the screen refresh is pretty good,
> however it would require quite a bit more CPU to render them 
> at eg. 60 fps instead of 24/30.

You wouldn't necessarily need to render them at the refresh rate, just a 
multiple of the refresh rate. So if you had a 60hz monitor, subtitle could be 
rendered at 30fps if 60fps was too slow, since the video framerate (i.e. 
24/30/60 fps VFR) shouldn't matter. The only tricky part would be making sure 
the start/end times are strictly obeyed, and the first/last subtitle frame of a 
line actually falls on the correct frame of the video. 

There is no particular reason the subtitles need to be rendered at VFR with 
this video renderer method, is there? Even if the subtitles are being presented 
at a different rate than the video, they should still appear smooth as long as 
that rate is a multiple of the refresh rate. After-all the video is also being 
synced to the refresh rate by the renderer.

I'm still trying to wrap my head around the feasibility of this idea. madshi, 
do you have any comments? Something like this would require a bit of work on 
the renderer side to keep everything in sync while presenting the subtitles and 
video at different frame rates, particularly subtitle start/end times as 
mentioned above. Would that be too much of a headache to implement on your end 
madshi?

Original comment by cyber.sp...@gmail.com on 28 Nov 2011 at 10:09

GoogleCodeExporter commented 9 years ago

Just noticed madshi replied while I was posting, but my questions still stand. 
Why would you need to render subtitles at 60hz, 120hz, or some other really 
high frame rate instead of just a _multiple_ of the refresh rate?

Original comment by cyber.sp...@gmail.com on 28 Nov 2011 at 10:14

GoogleCodeExporter commented 9 years ago

I'm reluctant to have madVR render at a different rate than the movie frame 
rate. That would be quite a big change in the presentation logic and what 
benefit would it have? None at all for madVR users, because you can already get 
pre-buffered subtitles with perfect start/stop times by setting the decoder 
queue to a high value in madVR. So why should I invest many hours to implement 
such a feature when it has no real benefit for madVR users? Makes no sense to 
me.

Ok, other renderers are a different topic, but I can't really speak for what 
would be best for VMR/EVR. The current VMR/EVR logic is most probably to render 
& present at movie frame rate, too. I kinda doubt you'll find a developer who 
is willing to change the whole VMR/EVR logic with the single benefit of 
improving subtitle pre-buffering for VFR content. That's a very specific and 
limited benefit, and would require a lot of work on the VMR/EVR render & 
presentation logic, I think. And such a change would also introduce a 
relatively high risk of new bugs and problems.

IMHO the pre-buffering logic of the subtitle renderer shouldn't require video 
renderers to totally change the way they render and present frames.

Original comment by mad...@gmail.com on 28 Nov 2011 at 11:05

GoogleCodeExporter commented 9 years ago

I understood the subtitle rendering at a higher rate to only doing so with the 
subtitles, and not changing the video renderer but I thought about the whole 
VFR issue some more, and i'm afraid it won't really work either, unless you 
know which frame rates are involved in a VFR file, or the video renderer 
changes its rendering logic (like madshi explained).

If you know which frame rates are involved, you need to find the smallest 
common multiple of all those frame rates, and only then you would be able to 
render at a rate that works always (on 24/30 mixed material, thats 120 fps) - 
otherwise if you render at 30 fps always, you still end up with the problem of 
matching video frames to subtitle frames, they just won't "fit". Obviously 
rendering at 120 fps is not a working solution either, because its just too 
much, and you would discard most of the frames.

I'm still stuck, and i think the only good solution is if the video renderer 
buffers enough data ahead, so you get told early enough about which frames are 
needed - up to a second with good queueing.

We could possibly think about the source also doing a floating framerate 
estimation, and trying to communicate that to the sub renderer, but thats 
really some complex interaction there.

Original comment by h.lepp...@gmail.com on 28 Nov 2011 at 1:05

GoogleCodeExporter commented 9 years ago

Even if you ignore pre-buffering, I do think there would be some benefits of 
syncing the subtitle framerate to display refresh rate. From a quality 
standpoint you would have smoother subtitle motion. You would also 
theoretically have the ability lower the rendering rate to any arbitrary 
frame-rate to save performance. Adaptive VSYNC logic similar to 3D games could 
also potentially be added, so instead of dropping frames when slowdowns occur, 
the subtitle renderer could temporarily could cut the frame-rate in half.

> (on 24/30 mixed material, thats 120 fps) - otherwise 
> if you render at 30 fps always, you still end up with 
> the problem of matching video frames to subtitle frames, 
> they just won't "fit".

Only the start/end times would need to fit. It matters little if subtitle 
movement between two points matches actual video frames. It would definitely 
require some complex new video renderer logic, and judging from what madshi 
said it's sounding like it may not be worth the effort to implement. If 
everybody involved thinks it's useless and/or impractical to ever implement in 
the future, I'll just scrap the idea.

>because you can already get pre-buffered subtitles 
>with perfect start/stop times by setting the decoder queue
>to a high value in madVR

Is that a typo? Why would subtitle input using this new interface end up in the 
madVR decoder queue? As of right now, it seems that neither madVR nor the 
subtitle renderer can implement any sort of pre-buffering, or we would end up 
stuttering subtitle motion with VFR content. Should we just require the video 
renderer to use motion compensated interpolation of subtitle data to create new 
fake frames when the frame-rate changes... That may help smooth out motion, but 
I still see it as a sub-optimal solution. I'm half-joking, since I expect that 
may be even more of a pain in the ass for the a video renderer to implement 
then the other idea.

This also reminds me of another madVR specific problem. If gamut/gamma 
correction via shaders or 3DLUT is being done in madVR, subtitles also need to 
have gamut/gamma correction performed. Did you ever figure out a solution to 
that problem? Is there anything which could be done with the new interface to 
make things easier?

Original comment by cyber.sp...@gmail.com on 28 Nov 2011 at 2:03

GoogleCodeExporter commented 9 years ago

> Only the start/end times would need to fit. It matters little if subtitle 
movement between two points matches actual video frames.

Didn't you say earlier you wanted smooth movement? If you don't, then why are 
we discussing?
If it doesn't match the video frames, then movement will not be smooth. You 
can't render subtitles at 30 fps and then match them onto a video which is 
running at 24 fps and expect 100% smooth movement. 

> Is that a typo? Why would subtitle input using this new interface end up in 
the madVR decoder queue?
> As of right now, it seems that neither madVR nor the subtitle renderer can 
implement any sort of pre-buffering,
> or we would end up stuttering subtitle motion with VFR content.

Thats now what he meant. The point is, with madVR (or any renderer with big 
queues) you don't need pre-buffering, because madVR internally buffers so many 
video frames. That means it will ask the subtitle renderer quite a bit in 
advance to render the subtitles, giving it enough time to do it before it needs 
to actually be displayed. If this process is multi-threaded, could as well do 
it in parallel for different frames, if the subtitle request logic allows that 
(and the subtitle renderer is thread safe)

To be honest, VFR is a crappy design to begin with. If you mix 24/30 content, 
it will never be fully smooth, because your screen will either run at (a 
multiple of) 24 or 30 (unless you have a screen that runs at 120 Hz, but 
honestly who does). If you run at 30/60, the 24fps parts will need 3:2 pull-up, 
if you run the screen at 24, some frames need to get dropped - all in all, it 
will never be perfectly smooth.

What i've seen are movies that run at 24p and in very low-motion parts drop to 
12fps to save bandwidth. As long as the VFR differences are integer-multiples, 
the problem is quite simpler, because you just render at the higher of the 
rates, and everything falls into place.

Trying to match subtitles 100% perfectly onto something that will never be 
perfectly smooth on the screen seems like alot of work for very little gain (if 
any).

Original comment by h.lepp...@gmail.com on 28 Nov 2011 at 2:18

GoogleCodeExporter commented 9 years ago

@cyberbeing, I've been trying to explain to you that the madVR queue system 
will automatically do subtitle pre-buffering, including perfect support of VFR 
content. There's not even anything special I have to do, or the subtitle 
renderer has to do, to make it work. It will just work by itself. If you set 
the madVR decoder queue to 32 frames, subtitles will also automatically be 
pre-buffered for 32 frames (provided the CPU is fast enough to fill the queues).

The only reason why we still keep discussing this is because of other video 
renderers. Or maybe 32 frames pre-buffering is not large enough? I don't know...

Original comment by mad...@gmail.com on 28 Nov 2011 at 2:31

GoogleCodeExporter commented 9 years ago

With 32 frames, on a 24p movie, you have more then a second for every subtitle 
frame to render. Obviously in that time you need to render the subtitles for 
more then just one frame (for 32 frames, in fact) - so while its not really 
"pre-buffering", it does allow for proper multi-threaded rendering (if your 
rendering code allows that, i should check that in libass), and compensates for 
performance spikes.

The important aspect of the pre-buffering is that the rendering is already 
finished when the video renderer wants to blit the image, otherwise you might 
delay the presentation of the frame, causing major glitches. With madVR, this 
would never be a problem, unless your PC is seriously too slow to render the 
subs (but nothing can save it then).

Original comment by h.lepp...@gmail.com on 28 Nov 2011 at 2:41

GoogleCodeExporter commented 9 years ago

One problem, though: madVR will likely call the subtitle renderer in the 
context of only 1 thread. If the rendering of one subtitle frame can not be 
multithreaded in itself, this design might not allow the subtitle renderer to 
run multi-threaded. Two possible solutions:

(1) We could allow the subtitle renderer to deliver requested subtitle frames 
asynchronously in whatever thread context it likes.

(2) We could add an option to allow the consumer to query whether the subtitle 
renderer supports multi-threaded rendering. And if it does, madVR could create 
one thread per CPU to ask for multiple subtitle rendered frames in parallel.

I don't know if and in which form vsfilter / libass support multi-threaded 
rendering, though.

Maybe (1) would be a better solution because it would allow the subtitle 
renderer to decide for itself which threading approach to use (if any)?

Original comment by mad...@gmail.com on 28 Nov 2011 at 2:51

GoogleCodeExporter commented 9 years ago

While (1) is certainly more flexible, it adds quite a bit complexity on both 
sides thats basically mandatory to support.

I just briefly checked libass, and rendering subtitles for different times 
simultaneously is not supported, because its a stateful system, and one 
subtitle may depend on the previous. I would imagine vsfilter is similar.

Not sure multi-threading is required..

Original comment by h.lepp...@gmail.com on 28 Nov 2011 at 2:56

GoogleCodeExporter commented 9 years ago

I disliked (1) while I wrote it. But after thinking about it, I do start to 
like it. The "consumer" could simply call "ISubRenderService.NeedFrame()", 
providing nothing but the start/stop time. Then the subtitle renderer could 
call "ISubRenderConsumer.DeliverFrame()" to pass the rendered subtitle frame to 
the consumer. It would be no effort at all to implement it this way in madVR. 
And if the subtitle renderer itself can't do multithreading, it could simply do 
this internally:

void ISubRenderService.NeedFrame(REFERENCE_TIME start, REFERENCE_TIME stop)
{
  InternalSingleThreadedAndBlocking_RenderSubtitleFrame(start, stop, ...);
  ConsumerInterface->DeliverFrame(start, stop, ...);
}

So it wouldn't be any more difficult for the subtitle renderer, either. Plus, 
if there is ever a subtitle renderer which can truely do multithreading, the 
interface would already support it.

That said, I'm not sure if multi-threaded subtitle rendering is really needed, 
either. But then it wouldn't hurt to allow the interface to support it?

@YuZhuoHuang, you're the vsfilter expert here. What's your view on 
multi-threading?

Original comment by mad...@gmail.com on 28 Nov 2011 at 3:12

GoogleCodeExporter commented 9 years ago

> Didn't you say earlier you wanted smooth movement? 
> If you don't, then why are we discussing?
> If it doesn't match the video frames, then movement will 
> not be smooth. You can't render subtitles at 30 fps and 
> then match them onto a video which is running at 24 fps 
> and expect 100% smooth movement. 

The point is you wouldn't be matching subtitles at 30 fps to a video running at 
24 fps on a 60hz monitor. The two would be running completely independently 
from each other. The 30 fps subtitles could be sync'ed to the refresh rate and 
always be smooth. The 24 fps video would have judder with uneven numbers of 
duplicate frames, since it doesn't cleanly sync with 60hz. The effect would be 
smooth subtitles on top of juddering video. Having 24 fps @ 60hz will always 
judder, so that in itself is a non-issue. 

I believe madshi already understood what I was talking about, but the entire 
implementation of such a thing would be rather complex. Basically the renderer 
would be presenting the 24 fps video @ 60hz, while at the same time mixing in 
30 fps of unique subtitle data with some of the duplicate frames created from 
the 24fps to 60hz conversion. Only the first and last frames of a line would 
need to be synced with the correct 24fps frame. As soon as that first frame is 
presented, 24fps->60hz unique_frames & duplicate_frames would be presented at 
30fps->60hz until the end time where it would be synced once again with a 
unique_frame (discarding overflow subtitle data). Implementation gets even more 
complex when, since adaptive logic would be needed which is capable of dealing 
with arbitrary refresh rates. Bottom line is it would require madshi to rewite 
a large portion of the presentation logic in madVR, so it's understandable that 
he wouldn't want to invest the time to do so.

> madVR queue system will automatically do subtitle pre-buffering,
> including perfect support of VFR content

Thanks for making that clear. Somehow I missed that madVR pre-buffering would 
automatically request subtitle data at the correct framerate when using this 
new interface and deal with VFR properly.

Original comment by cyber.sp...@gmail.com on 28 Nov 2011 at 3:27

GoogleCodeExporter commented 9 years ago

I think subs rendered matching actual video frames should be better than sync 
with display refresh rate. Many typesetters and karaokers like to split some 
transforming and moving effects into one script line per frame, e.g., 24 lines 
per second when fps is 24/1.001 and 30 lines per second when fps is 30/1.001. 
For some effects which imitate the original effects in video clips, it is 
important to match exactly with every frame. To sync with display refresh rate 
might be less smooth then.

Original comment by astrat...@gmail.com on 28 Nov 2011 at 10:23

GoogleCodeExporter commented 9 years ago

For prebuffer:
I don't see any feasible solution for VFR. And as long as the current framerate 
is available, prebuffer works perfect (just like you're not prebuffer) for most 
content. So I can live with it.
But the refresh rate make me a bit curious: how the video renderer sync a 24 
fps video to a 60Hz display? Or it is all done automatically by hardware?

> The "consumer" could simply call "ISubRenderService.NeedFrame()", 
> providing nothing but the start/stop time. Then the subtitle 
> renderer could call "ISubRenderConsumer.DeliverFrame()" to pass 
> the rendered subtitle frame to the consumer.

>  it wouldn't hurt to allow the interface to support it?

Hmmm it looks simple enough. Then a ref-count to the bitmaps, or something 
similar to ReleaseBitmap would be necessary. So that the sutitle renderer has a 
way to know if the video renderer has done with the bitmaps. Even in the 
original interface, I would like such mechanism too. I don't see what we'll 
lost. Then why not?

For multi-threads:
I don't see any benefit a subtitle renderer can get via such multi-threads in 
the context of playback. Two adjacent subpics are usually similar. And that 
similar part can be cached to save redundent calculation. Take moving text as a 
example once more, if the subtitle renderer works in single thread, it creates 
the bitmap the first time and caches that; for every request frame that 
follows, it only needs to  update the position; but if the subtitle renderer 
works in multi-threads, some complicate sync mechanism would be needed to avoid 
unnecessary repeated jobs, or else it would only hurts performance. 
I think threaded the creation of one single subpic would be better, e.g. 
multi-threaded the blur effect like gommorah@cccp done in threaded-vsfilter. 
But it has nothing to do with the interface.

> Many typesetters and karaokers like to split some transforming 
> and moving effects into one script line per frame

This would not be a problem. Since it can be guaranteed that prebuffer won't 
skip any subtitle line (so that no major changes of the subtitle during 
playback would be ignored), no matter the prebuffer framerate is 24 fps or 10 
fps. 
Here comes the moving text example again (-_-!), assuming you're using the 
/move tag in a subtitle line to get a moving text during 1 sec to 2 sec, and in 
following 24 lines, by manually changing the position of text every 42ms, you 
get another moving text during 2 sec to 3 sec. The subtitle renderer now 
prebuffers at 10 fps. For the first subtitle line which has a /move tag lasting 
for 1 sec, the prebuffer fps will be applied and only 10 subpics would be 
prebuffered. For the following subtitle lines, indeed the prebuffer fps won't 
be used in order to avoid skipping any subtitle lines. So there will be at 
lease one subpic per line, 24 subpics would be prebuffered for 2 sec to 3 sec 
in this example. It's just like the subtitle renderer is forcing to work at 24 
fps.

Original comment by YuZhuoHu...@gmail.com on 29 Nov 2011 at 4:16

GoogleCodeExporter commented 9 years ago

> This would not be a problem. Since it can be guaranteed that prebuffer won't 
skip 
> any subtitle line (so that no major changes of the subtitle during playback 
would 
> be ignored), no matter the prebuffer framerate is 24 fps or 10 fps. 
> Here comes the moving text example again (-_-!), assuming you're using the 
/move 
> tag in a subtitle line to get a moving text during 1 sec to 2 sec, and in 
following 
> 24 lines, by manually changing the position of text every 42ms, you get 
another 
> moving text during 2 sec to 3 sec. The subtitle renderer now prebuffers at 10 
fps. 
> For the first subtitle line which has a /move tag lasting for 1 sec, the 
prebuffer 
> fps will be applied and only 10 subpics would be prebuffered. For the 
following 
> subtitle lines, indeed the prebuffer fps won't be used in order to avoid 
skipping 
> any subtitle lines. So there will be at lease one subpic per line, 24 subpics 
would 
> be prebuffered for 2 sec to 3 sec in this example. It's just like the 
subtitle 
> renderer is forcing to work at 24 fps.

It is not a problem for pre-buffering, but is an issue of rendering at display 
refresh rate.

Original comment by astrat...@gmail.com on 29 Nov 2011 at 4:48

GoogleCodeExporter commented 9 years ago

> It is not a problem for pre-buffering, but is an issue 
> of rendering at display refresh rate.

It shouldn't be an issue of rendering at the display refresh rate either, and 
is the entire reason I made the point that the renderer would need to ignore 
the rate it is receiving subtitle data and perform a correction so the correct 
subtitle start and end time frames would always fall on the correct frame of 
the video. Any 24 fps frame-by-frame subtitles would still end up with 24 
unique frames per second, identical to regular VSFilter (it would be the 
renderers job to perform this correction). The 30 fps, or whatever the 
pre-buffer framerate ended up being would only apply to subtitles tags which 
produce movement.

I realize this is all rather confusing, but theoretically it should work just 
fine as long as the renderer implemented this complex start/end time logic 
correctly. The entire concept of implementing such a thing is rather 
interesting, but this has all been a rather thinking-outside-the-box idea. If 
we ever go this route, lots of documentation and some open source code showing 
an example of a correct implementation would be needed. None the less, I am 
quickly realizing the entire idea is somewhat impractical and hackish, even if 
it would work.

Original comment by cyber.sp...@gmail.com on 29 Nov 2011 at 5:57

GoogleCodeExporter commented 9 years ago

Thinking about it a bit more, there may be an easier way that avoided start/end 
timing issues by having the subtitle renderer pre-compensate for VFR.

Non-movement lines would be rendered at 24fps with the subtitle renderer 
performing a 3:2 pull-up to 30fps.

Movement lines would be rendered at 30fps.

The movement and non-movement lines would then be combined into a single 30fps 
subtitle data stream and passed to the video renderer to present as-is at 30fps.

This would still require the video renderer to present subtitles at potentially 
a faster rate than the video, but it should eliminate the need for the video 
renderer to compensate for subtitle start/end times. The only requirement would 
be that the subtitle renderer would need to perform an identical pull-up to the 
video renderer.

Original comment by cyber.sp...@gmail.com on 29 Nov 2011 at 6:34

GoogleCodeExporter commented 9 years ago

I still agree with madshi though - asking the video renderer to drastically 
change its presentation logic just to accommodate a subtitle renderer is quite 
a lot to ask (and unlikely to happen) - especially because it adds a 
performance overhead that would also affect people that do not use subtitles.

In addition, you really cannot be sure a movie is VFR until a frame rate change 
happens, so how do you render the subtitles? Always at 30fps, even if the movie 
might be completely 24p? Sounds not optimal.

You could do it if you had all information and control over the complete 
playback chain - but you never have all required information unless you scan 
the whole file (which is very impractical), and if you want this interface to 
be actually adopted, you need to avoid requirements which require drastic 
changes.

Original comment by h.lepp...@gmail.com on 29 Nov 2011 at 6:46

GoogleCodeExporter commented 9 years ago

.. Something i forgot to write:

If a renderer wants to support rendering the subtitles at screen refresh (or 
refresh/2), it could just do that with the current interfaces - all it has to 
do is "lie" to the subtitle renderer and claim the streams frame rate is 30/60 
fps, and the subtitle renderer could then use that for pre-buffering

Original comment by h.lepp...@gmail.com on 29 Nov 2011 at 6:49

GoogleCodeExporter commented 9 years ago

> If a renderer wants to support rendering 
> the subtitles at screen refresh (or 
> refresh/2), it could just do that with the 
> current interfaces - all it has to do is 
> "lie" to the subtitle renderer and claim 
> the streams frame rate is 30/60 fps
Ah!

Original comment by YuZhuoHu...@gmail.com on 29 Nov 2011 at 6:58

GoogleCodeExporter commented 9 years ago

> In addition, you really cannot be sure a movie is VFR
> until a frame rate change happens, so how do you render
> the subtitles? Always at 30fps, even if the movie might 
> be completely 24p? Sounds not optimal.

Yes, always at 30fps. Rendered @ 24fps with 3:2 pull-up to 30fps for 
non-movement plus start/end frames, and rendered @ true 30fps for movement. 
There wouldn't be any significant downside to this. Movement rarely lasts more 
than a few seconds, so the overhead over an entire video would likely be less 
than 1%. The benefit is you would have judder-less subtitle motion since it 
would be in sync with your refresh rate, and the subtitle renderer could 
prebuffer as much as it wanted.

> I still agree with madshi though - asking the video renderer to 
> drastically change its presentation logic just to accommodate a 
> subtitle renderer is quite a lot to ask (and unlikely to happen)
> - especially because it adds a performance overhead that would 
> also affect people that do not use subtitles.

I still agree with madshi as well. At this point I'm basically just responding 
to criticisms about feasibility and practicality of the idea. I'm ready to stop 
talking about it whenever everybody else stops commenting on it ;)

Original comment by cyber.sp...@gmail.com on 29 Nov 2011 at 7:08

GoogleCodeExporter commented 9 years ago

YuZhuoHuang wrote:

> Hmmm it looks simple enough. Then a ref-count
> to the bitmaps, or something similar to ReleaseBitmap
> would be necessary. So that the sutitle renderer has
> a way to know if the video renderer has done with the
> bitmaps. Even in the original interface, I would like
> such mechanism too.

ReleaseBitmap should not be necessary, neither in the original nor in the 
asynchronous interface. Basically the subtitle renderer creates an 
ISubRenderFrame interface and AddRef()s it once. The consumer will call 
Release() on the interface, when he's done with it. That's the usual practise 
when dealing with IUnknown interfaces. So the freeing will be realized through 
the IUnknown logic. Basically if you get a Release() call and the reference 
count goes to 0, the interface should be destroyed, with all bitmaps etc 
within. I think one of the MS base classes already handles the interface 
reference counting for you. So if you base your ISubRenderFrame interface on 
the MS base classes, your destructor should be automatically called, when the 
consumer Release()s it.

> I don't see what we'll lost. Then why not?

@Hendrik? You ok with that, or you still prefer the forced synchronous 
approach? I'm also still waiting for your pondering about the overall 
concept...  #:-D  I've not changed the connection initiator in the interface 
yet, wanted to wait until you've spent some more thoughts on whether you like 
the overall approach, or whether you have a totally different idea.

Original comment by mad...@gmail.com on 29 Nov 2011 at 8:30

GoogleCodeExporter commented 9 years ago

> @Hendrik? You ok with that, or you still prefer the forced synchronous 
approach?

The new approach seems fine, it gives us the flexibility to do multi-threading 
in the future, even if its not a viable solution right now (without really 
adding any complexity). The documentation should make sure that there needs to 
be exactly one call of DeliverFrame for every NeedFrame, and not more and not 
less - and in the same order, preferably.

Something else i've been thinking about is if/how to manage multiple subtitle 
consumers in one graph.
I've been considering to implement the subtitle interface in LAV Video, which 
would allow you to have a subtitle renderer in a graph without a compatible 
renderer - without requiring to pipe the image through the subtitle renderer. 
The problem is how to properly handle the case when both LAV Video and a 
compatible renderer are present.

Maybe we could define a "priority" attribute of sorts, with a good renderer 
having a high priority, and a video decoder like LAV Video having a lower 
priority, so it would only be used as fallback?

Granted, it does add a bit of complexity to the whole connection between the 
consumer and producer (need to scan the whole graph for the highest priority 
filter, instead of just using the first you find), but i could see a use case.

Original comment by h.lepp...@gmail.com on 29 Nov 2011 at 9:10

GoogleCodeExporter commented 9 years ago

Yeah, you're right, having this in LAV Video might be beneficial for some 
users. There is a reason I named the interface "consumer" and not 
"videoRenderer", I already anticipated that non-video-renderers might be 
eventually interested in becoming "consumers", too.

I see 2 possible solutions to the problem:

(1) The subtitle filter chooses the filter which is most downstream. That 
should normally be the video renderer. If both video decoder and video renderer 
expose the new interface, the video renderer would automatically be chosen.

(2) As you suggest, we could introduce some kind of priority/merit system, with 
already predefined values for video renderer (e.g. 0x1000) and video decoders 
(e.g. 0x500).

In both cases it would make more sense to let the subtitle renderer be the one 
who's initiating the connection, so I guess I have to agree with you that that 
is the better approach. Probably it would make sense for the subtitle renderer 
to only do one connection at a time, so it won't connect to both video renderer 
*and* video decoder. Agreed?

Hmmmm... Are you planning to output DVD subtitles to xy-vsfilter and then get 
them back via the new interface, so you don't have to implement your own 
subtitle drawing for DVD playback?  :-P  This brings me to a new discussion 
point:

Are multiple xy-vsfilter instances per graph allowed? If so, which one is 
responsible for auto loading external subtitle files? And which one would make 
a connection to the video renderer? Or do we only allow one xy-vsfilter 
instance per graph? But then it would probably need to support multiple 
subtitle input pins? Would it then render only one of those input pins? Or 
would it render all of them and combine/blend them? Ouch, this is getting ugly. 
As far as I remember, some splitters output multiple subtitle pins instead of 
implementing IAMStreamSelect? How would we solve that with xy-vsfilter?

Original comment by mad...@gmail.com on 29 Nov 2011 at 9:27

GoogleCodeExporter commented 9 years ago

 > Non-movement lines would be rendered at 24fps with the subtitle renderer performing a 3:2 pull-up to 30fps.
 > 
 > Movement lines would be rendered at 30fps.
 > 
 > The movement and non-movement lines would then be combined into a single 30fps subtitle data stream and passed to the video renderer to present as-is at 30fps.

Think about a 24fps clip, with a simple \move effect to trace a moving item in 
original video clip.
Simply consider 4 frames rendering. The item in the video clip moves from:
(x,y)->(x+4,y)->(x+8,y)->(x+12,y)
And the typesetter supposes the moving effect move in such positions:
(x,y+10)->(x+4,y+10)->(x+8,y+10)->(x+12,y+10)
and uses \move(x,y+10,x+12,y+10)

However, if subtitles are rendered at 30fps, the positions in the 5 rendered 
frames turn to be:
(x,y+10)->(x+3,y+10)->(x+6,y+10)->(x+9,y+10)->(x+12,y+10)
which will not accurately sync with the moving item as the typesetter's 
original thought.

Original comment by astrat...@gmail.com on 29 Nov 2011 at 9:32

GoogleCodeExporter commented 9 years ago

PLEASE stop discussion about 24fps vs 30fps. We've already decided we won't do 
this. So there's really no need to discuss it any further. Thanks.

Original comment by mad...@gmail.com on 29 Nov 2011 at 9:34

GoogleCodeExporter commented 9 years ago

> Hmmmm... Are you planning to output DVD subtitles to xy-vsfilter and then get 
them
> back via the new interface, so you don't have to implement your own subtitle 
drawing
> for DVD playback?  :-P

I'll most likely implement my own drawing in this case, because DVDs are 
special - the DVD navigator actually provides a palette, as well as the 
commands which areas should be highlighted in a different color (menu 
navigation), i don't think i can translate all those things for vsfilter to 
understand.

IMHO, there should only be one subtitle renderer allowed. If the source does 
not offer IAMStreamSelect (or vsfilter loads an external subtitle script), it 
should offer a IAMStreamSelect itself (i thought it already did that, though!), 
and just accept multiple subtitle input pins.

Original comment by h.lepp...@gmail.com on 29 Nov 2011 at 9:39

GoogleCodeExporter commented 9 years ago

@YuZhuoHuang, how does xy-vsfilter work if there is a splitter which outputs 
multiple subtitle pins? Does xy-vsfilter accept multiple subtitle input pins 
and then implement IAMStreamSelect to let the user choose? Or is there some 
other logic in place to handle this situation?

Original comment by mad...@gmail.com on 29 Nov 2011 at 9:47

GoogleCodeExporter commented 9 years ago

And important issue: alpha format.
For CPU to do alphablending, the most easy way would be using pre-multiplied 
alpha. That is, the video renderer should use the following formular 
    videoRGB * subAlpha + subRGB
to combined the series of bitmaps on to the video frame.
If using source alpha ( alphablending with videoRGB*(1-subAlpha) + 
subRGB*subAlpha ) or target alpha ( alphablending with videoRGB*subAlpha + 
subRGB*(1-subAlpha) ), then subtitle renderer should do a division to give the 
correct bitmaps when it combines some of the bitmaps.
I am not sure if an option is needed to allow different alpha format, or just 
force both renderer to use pre-multiplied alpha. But at least, pre-multiplied 
alpha should be recommended in the comment if an option is provided.

Original comment by YuZhuoHu...@gmail.com on 29 Nov 2011 at 9:51

GoogleCodeExporter commented 9 years ago

> how does xy-vsfilter work if there is a splitter which 
> outputs multiple subtitle pins? Does xy-vsfilter accept
> multiple subtitle input pins and then implement 
> IAMStreamSelect to let the user choose? Or is there 
> some other logic in place to handle this situation?

I am not very sure on this part. As far as I know, it accept multiple subtitle 
pins and implement IAMStreamSelect.

Original comment by YuZhuoHu...@gmail.com on 29 Nov 2011 at 10:12

GoogleCodeExporter commented 9 years ago

> And important issue: alpha format.

Hmmmm... Are all of these different formats still 8bit per RGB channel and 8bit 
per alpha channel? If so, I guess we could switch different alpha formats by 
defining a new "option".

I would guess that CPU based consumers (like LAV Video would be) would probably 
prefer pre-multiplied alpha (Hendrik?). However, GPU based consumers like madVR 
will do the work via Direct3D, and there it will probably be easier to get a 
standard compliant RGBA image. Meaning the subtitle RGB values should not be 
pre-multiplied. And the alpha channel should be 0 to indicate a transparent 
pixel and 255 to indicate an opaque pixel.

The question is: Which should be the default format? I would vote for standard 
compliant RGBA, because that's the cleaner approach. And the primary consumers 
for the new interface should be GPU based video renderers, so standard 
compliant RGBA also makes more sense, I think.

Other opinions?

> I am not very sure on this part. As far as I
> know, it accept multiple subtitle pins and
> implement IAMStreamSelect.

That would be good.

Original comment by mad...@gmail.com on 29 Nov 2011 at 10:34

GoogleCodeExporter commented 9 years ago

> (1) The subtitle filter chooses the filter which
> is most downstream. 

> (2) As you suggest, we could introduce some kind 
> of priority/merit system

(3) Any one who want to be a consumer try to connect to subtitle filter. Then 
subtitle filter accepts the first one or the last one or the one it like most 
(then disconnect previous accepted one) or accept several if it can?

Original comment by YuZhuoHu...@gmail.com on 29 Nov 2011 at 10:37

GoogleCodeExporter commented 9 years ago

I don't think it really matters if the subtitle renderer pre-multiplies the 
values, or if the consumer then does that if its needed (unless vsfilter 
outputs them like that for some reason?)

I would vote for standard RGBA as well, converting the RGBA image to 
pre-multiplied is easy enough to do yourself if that offers a serious advantage 
to the consumer.
In LAV Video, it would also be quite possible that i would need to blit onto a 
YUV image, which means i would need to convert the RGBA to YUVA - if its still 
plain RGBA, i can just send it through my existing setup for conversion to YUV.

Original comment by h.lepp...@gmail.com on 29 Nov 2011 at 10:39

GoogleCodeExporter commented 9 years ago

> (3) Any one who want to be a consumer try to connect to subtitle filter.
> Then subtitle filter accepts the first one or the last one or the one it like 
most 
> (then disconnect previous accepted one) or accept several if it can?

I really don't like that. It seems too random. There should only be one active 
consumer, more then one doesn't make sense. Considering that, it does make 
sense that the subtitle renderer picks which consumer it works with, however it 
should not be random like "first" or "last".

(1) would be a OK solution,
(2) is the more complete solution.

Original comment by h.lepp...@gmail.com on 29 Nov 2011 at 10:43

GoogleCodeExporter commented 9 years ago

> Are all of these different formats still 8bit per 
> RGB channel and 8bit per alpha channel? 

Yes.

> GPU based consumers like madVR will do the work 
> via Direct3D, and there it will probably be 
> easier to get a standard compliant RGBA image.

> I don't think it really matters if 
> the subtitle renderer pre-multiplies 
> the values, or if the consumer then 
> does that if its needed (unless 
> vsfilter outputs them like that for some reason?)

E.g. we now have a series of two bitmaps RGBA1 and RGBA2 (in standard compliant 
RGBA). Combining them on to video vRGB, we'll get
    vRGB*(1-A1)*(1-A2) + RGB1*A1*(1-A2) + RGB2*A2
Let Ac = 1 - (1-A1)*(1-A2), then the above equals to
    vRGB*(1-Ac) + ((RGB1*A1*(1-A2) + RGB2*A2)/Ac) * Ac
That is, if now the combining operation of the two bitmaps done by the subtitle 
renderer, it should return a bitmap with RGB value (RGB1*A1*(1-A2) + 
RGB2*A2)/Ac and alpha value 1 - (1-A1)*(1-A2), if using source alpha. It's 
complicate, isn't is? ((RGB1*A1*(1-A2) + RGB2*A2)/Ac introduced a minor error 
too.)

However if using pre-multipled alpha, Combining them on to video vRGB, we'll get
    vRGB*A1*A2 + RGB1*A2 + RGB2
Now if the combining operation of the two bitmaps done by the subtitle 
renderer, it can simply do the same operation
     RGBA1*A2 + RGB2
and output a bitmap with RGB value  RGB*A2 + RGB2 and alpha value A1*A2.

Original comment by YuZhuoHu...@gmail.com on 29 Nov 2011 at 11:05

GoogleCodeExporter commented 9 years ago

> I really don't like that. It seems too random. 
> There should only be one active consumer, more 
> then one doesn't make sense. Considering that, 
> it does make sense that the subtitle renderer 
> picks which consumer it works with, however it 
> should not be random like "first" or "last".

Just realized the downmost filter may not the last one asking for a connect. 
But if some kind of priority/merit system is introduced, of course we can let 
subtitle filter select the consumer according to that.

Original comment by YuZhuoHu...@gmail.com on 29 Nov 2011 at 11:16

GoogleCodeExporter commented 9 years ago

Trying to combine all of the smaller bitmaps in one step is a futile experiment 
anyway, it could be hundreds.

I would just do something like this:
  foreach(Subtitle)
     vRGB = vRGB * (1-A) + RGB * A
  end

Doing this, there is no big advantage in pre-multiplication, unless i'm missing 
something why my loop wouldn't work?

Original comment by h.lepp...@gmail.com on 29 Nov 2011 at 11:21

GoogleCodeExporter commented 9 years ago

> will not accurately sync with the moving item as the typesetter's original 
thought

@astrataro 

I've sent you an email explaining why that shouldn't be an issue. If you want 
to debate it we can do so via email, but as madshi already made clear, the idea 
has already been scrapped and is NOT going to be implemented, so it no longer 
even needs to be discussed. Any further comments in Issue #40 about this 
scrapped idea will be deleted.

> PLEASE stop discussion about 24fps vs 30fps.
> We've already decided we won't do this. 
> So there's really no need to discuss it any further. Thanks.

Original comment by cyber.sp...@gmail.com on 29 Nov 2011 at 11:22

GoogleCodeExporter commented 9 years ago

> Doing this, there is no big advantage 
> in pre-multiplication, unless i'm 
> missing something why my loop wouldn't work?

The problem is, we now have a maxNumBitmaps option with which consumers who 
wants a big subpic can set it to 1 to force the subtitle renderer do the 
alphablending.
And the following two:
1.
  //done by consumer
  foreach(Subtitle)
     video = video alphablend subtitle 
  end
2.
  //done by subtitle renderer 
  big_subpic = empty
  foreach(Subtitle)
     big_subpic = big_subpic alphablend subtitle 
  end
  //done by consumer
  video = video alphablend big_subpic

give different video if using source alpha.

And for performance on CPU
In comparison to
  big_subpic = first Subtitle
  for(int i=2nd Subtitle; i<=last Subtitle; i++ )
     big_subpic = big_subpic alphablend i
  end
using source alpha,
  big_subpic = last Subtitle
  for(int i=2nd last Subtitle; i>= 1st Subtitle; i-- )
     big_subpic = i alphablend big_subpic
  end
using pre-multiplied alpha saves one multiplication.

Original comment by YuZhuoHu...@gmail.com on 29 Nov 2011 at 11:39

GoogleCodeExporter commented 9 years ago

> And for performance on CPU
> In comparison to
>   big_subpic = first Subtitle
>   for(int i=2nd Subtitle; i<=last Subtitle; i++ )
>      big_subpic = big_subpic alphablend i
>   end
> using source alpha,
>   big_subpic = last Subtitle
>   for(int i=2nd last Subtitle; i>= 1st Subtitle; i-- )
>      big_subpic = i alphablend big_subpic
>   end
> using pre-multiplied alpha saves one multiplication.

Found that I was wrong with this point. Since there is a extra 
pre-multiplication, it is not really saved. (But if you cache pre-multiplied 
result and reuse it, then the multiplication is really saved.)

Original comment by YuZhuoHu...@gmail.com on 29 Nov 2011 at 11:57

GoogleCodeExporter commented 9 years ago

I assume the D3D way would also work like that, simply apply all small bitmaps 
one after each other.
You cannot just blend them "normally" onto a black surface, you have to account 
for the alpha values, considering that not every pixel is 100% opaque.

You would have to create the combined RGBA image to be compatible with that, 
following pseudo code should be capable.

 Abig = RGBbig = 0
 foreach(subpic)
   Abig_old = Abig
   Abig = (1 - ((1-Abig) * (1-Asub))
   RGBbig = (RGBbig * Abig_old + RGBsub * Asub) / Abig
 end

Creating the combined RGBA image like that, the result is exactly the same as 
when blending all images one after another. BTW i also confirmed that my first 
loop is equal to your code you posted before, just unrolled into a loop instead 
of combining into one line.

I still think we should stick to one format, that being default RGBA, because 
it doesn't really cost much more - the alpha blending is really easy, there are 
good SIMD operations to facilitate operations like (A1 * B1) + (A2 * B2), so 
MMX/SSE2 optimizations would greatly speed up the process.

Original comment by h.lepp...@gmail.com on 29 Nov 2011 at 12:45

GoogleCodeExporter commented 9 years ago

Maybe we should remove "maxNumBitmaps" and instead introduce "combineBitmaps = 
bool". So there would only be 2 options: Either deliver all bitmaps separately, 
or combine them all into one bitmap. I don't really see any purpose of doing 
something in between.

Would that make the alpha topic easier? I mean this way the alpha blending 
would always be done by either the subtitle renderer or by the consumer, but 
never by both.

Original comment by mad...@gmail.com on 29 Nov 2011 at 12:48

GoogleCodeExporter commented 9 years ago

It doesn't make the alpha topic earlier, but it does make more sense. I don't 
think its really useful to allow a certain number of bitmaps but not more..

PS:
My code above is flawed, going to fix it..

Original comment by h.lepp...@gmail.com on 29 Nov 2011 at 1:01

GoogleCodeExporter commented 9 years ago

> You cannot just blend them "normally" onto a black surface

When using pre-multiplied alpha, you can.

>    RGBbig = (RGBbig * Abig_old + RGBsub * Asub) / Abig

This division is the point. Using source alpha or target alpha, you need to do 
that, while using pre-multiplied alpha, you don't.

> SIMD operations to facilitate operations 
> like (A1 * B1) + (A2 * B2), so MMX/SSE2 
> optimizations would greatly speed up the 
> process

I do no serious test, but I am quite positive that do A1*B1+A2 using MMX/SSE2 
would be faster than A1*B1+A2*B2.

> "maxNumBitmaps" and instead introduce 
> "combineBitmaps = bool"

Indeed, it makes no difference for the alpha topic. The point is this

>    RGBbig = (RGBbig * Abig_old + RGBsub * Asub) / Abig

If using source alpha, then it'd better that consumer don't set combineBitmaps 
to true (or set maxNumBitmaps to a small value). Or else it'd better that the 
subtitle renderer implements a correct alphablending with a convertion above.
So the reason why using a maxNumBitmaps option not combineBitmaps, is we won't 
lose anything by using maxNumBitmaps.
Indeed I don't think it is really serious issue, as long as the consumer know 
what the alpha format option (if give) means. So just add some more comment 
like "if using source alpha/target alpha, maxNumBitmaps is recommended to set 
to infinit"?

Original comment by YuZhuoHu...@gmail.com on 29 Nov 2011 at 1:24

Previous Next