pimanttr / nixysa

Automatically exported from code.google.com/p/nixysa
Apache License 2.0
0 stars 0 forks source link

[async] callbacks not working (FF3.5.3/Vista) #2

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Declare an [async] callback in your IDL file and include the template 
code (eg. Closure or CallbackN from O3D's callbacks.h)
2. Set a breakpoint at the NPN_CreateObject() line in your glue .cc file.
3. Trigger the callback from your worker thread.
4. Watch as NPN_CreateObject() returns null due to being called from a 
worker thread.

What is the expected output? What do you see instead?

Event successfully fired in the browser.

What version of the product are you using? On what operating system?
FF3.5.3, Windows Vista SP1

Please provide any additional information below.

The problem is that NPN_CreateObject() is being used to allocate memory for 
a standard C++ class instance (NPCallback). This might work on some 
platforms but for at least my version of Firefox, thread safety checks seem 
to be in place. I've attached a complete patch to fix this.

My solution is a simplification of the NPCallback class and use of the 
standard new/delete allocator. 

When allocating the NPCallback the function and arguments are NPN_Retained 
and strings copied. When freed, the objects are NPN_Released and strings 
deleted. The NPCallback object is freed automatically after the UI thread 
worker function completes. 

I haven't used any locking semantics but I have avoided doing anything 
besides returning "true" after enqueing the UI thread worker function so I 
can't see any potential race problems.

I hope I haven't violated any coding conventions with the patch. I'm more 
than happy to make further changes if need be. 

Original issue reported on code.google.com by aaron...@gmail.com on 29 Sep 2009 at 5:37

Attachments:

GoogleCodeExporter commented 9 years ago
I looked into whether this would be feasible.

The asynchronous callbacks are not designed to be a mechanism for invoking 
callbacks
from a worker thread. Nixysa is not thread safe and may only be called from the 
main
plugin (UI) thread. Asynchronous are intended only to be a means of deferring 
the
execution of the callback from the main plugin thread.

It would be difficult to allow callbacks to be invoked from another thread. 
Most of
NPAPI is not thread safe. NPN_RetainObject cannot be called from a worker 
thread,
which prevents it from retaining the callback function itself and any object 
arguments.

Original comment by apatr...@google.com on 12 Nov 2009 at 12:22

GoogleCodeExporter commented 9 years ago
If you still wish to use this patch, there's 2 small typos on one line that 
prevents
Closures from working.

in npapi_generator.py
-      NPCallback* callback = NPCallback::create(npp, npobject, 
std::vector<NPVariant
*>());
+      NPCallback* callback = NPCallback::Create(npp, npobject,
std::vector<NPVariant>());

Original comment by mart...@gmail.com on 12 Nov 2009 at 3:50

GoogleCodeExporter commented 9 years ago
Just an addendum to this:

I realize this is outside of the original design scope so I understand it being 
closed but I think it should be trivial to achieve and would be an ultimate 
plus for 
the nixysa community.

The callback shouldn't need retaining from the worker thread if it is done on 
object 
creation (which is always going to be in the main thread), right?

I've been short on time and haven't yet gotten around to submitting a patch via 
gcl 
(I posted a second one on the discussion list) but my latest version relies on 
the 
existence of C++ assignment operators. 

It simply takes a copy of all the arguments to the callback and stores them in 
a 
structure allocated via new/delete from any thread. It then triggers the actual 
callback via NPN_PluginThreadAsyncCall which takes care of marshaling, invoking 
and 
releasing. The last thing this function does is delete the structure used to 
pass the 
NPN_PluginThreadAsyncCall'd function its arguments. 

This should be thread safe as far as I can tell. The biggest dangers are:
  1. Async Calls are not supported - here we either do nothing or risk a crash.
  2. The browser ignores the RetainObject and frees the callback before call takes 
place, leading to a small memory leak. (This might happen on a page refresh?)

I've been using my patch without any issues but at this stage I'm still just 
running 
my plugin in a javascript test harness so it hasn't seen any real-world use 
just yet.

Original comment by aaron...@gmail.com on 13 Nov 2009 at 12:57

GoogleCodeExporter commented 9 years ago
I have run into the same issue. The callback works in chrome but not in FF. Is 
there 
another way of registering callbacks?

Original comment by amat...@google.com on 6 Jan 2010 at 3:48

GoogleCodeExporter commented 9 years ago
You need a reference to the NPP instance to do an NPN_PluginThreadAsyncCall so 
you 
probably need to edit the static glue. I'm still running my patch without any 
issues. 
I haven't used it in any production environment yet but haven't had a single 
problem 
and I've been developing my plug-in daily. The basic idea of my patch is to:

1. Do native C++ copies of variables you want to pass to the callback function, 
2. Trigger the async all from the browser thread
3. Safely call the desired callback from the browser thread.
4. Free the copied variables from step 1 (from the browser thread).

I haven't hit any issues so far with Chrome, FF or Safari but I'm not a browser 
engineer so I'm not claiming its 100% kosher on all platforms and browsers. I 
haven't 
tried the IE ActiveX wrapper yet.

As far as I can tell, all other ways of doing this involve manually editing the 
static glue code which is not very elegant.

Original comment by aaron...@gmail.com on 6 Jan 2010 at 7:38

GoogleCodeExporter commented 9 years ago
Hello Aaron,
Thanks for the reply. Manually editing the generated glue code is exactly what 
I 
would want to avoid.
I am not a browser engineer either and I am quite new to gecko codebase and 
NPAPI 
plugins, which is why I do not understand the problem/fix completely.
I have not tested your patch yet but will try it out next week. Which versions 
of 
Chrome/FF have you tested your patch on?
Thank you,
Apurva

Original comment by amat...@google.com on 7 Jan 2010 at 6:08

GoogleCodeExporter commented 9 years ago
I think I first ran it on FF3.0.x and Chrome 3.x but I've been continually 
upgrading my 
browsers since then. It also works on Minefield (Firefox nightly builds). I 
haven't 
found a browser it hasn't worked with but I have yet to Try IE6, IE7, IE8, 
Opera or Mac 
Safari.

Best of luck!

- Aaron

Original comment by aaron...@gmail.com on 7 Jan 2010 at 6:40

GoogleCodeExporter commented 9 years ago
Ok, I will give it a shot and will update you on the results (good or bad).
Apurva

Original comment by amat...@google.com on 7 Jan 2010 at 6:51

GoogleCodeExporter commented 9 years ago
I have also run into this problem now.  My plugin development was going fine 
under 
chrome, but when I try and run the plugin under Firefox 3.5.6 I get the same 
crash as 
described above.

What is the current status of this issue?  Can I use the patch (which does not 
cleanly apply to rev 67) in the mean time, or is there an officially sanctioned 
fix 
in the pipeline?

Cheers

Matt

Original comment by mb5p3nc...@googlemail.com on 16 Feb 2010 at 5:09

GoogleCodeExporter commented 9 years ago
Are you invoking a callback from a thread you create yourself? If so that's a
problem. The only NPAPI function that can safely be called off the main thread 
is
NPN_ThreadAsyncCall. We need to call some other NPAPI functions in the 
implementation
of the invocation of the async callback, I think NPN_RetainObject. The patch 
might
work for you but I'm not confident it will work reliably. That's why I didn't
integrate it. 

Original comment by apatr...@google.com on 16 Feb 2010 at 5:39

GoogleCodeExporter commented 9 years ago
The callback is being executed from a thread created in a client library.

The problem comes around as I have a need to create an object to pass to the
callback.  This invokes a ::CreateNPObject, which is causing the crash.

As the callback has been declared as async, is it not possible to defer all 
calls to
<class>::CreateNPObject into the callback code that is executed as a part of
NPN_PluginThreadAsyncCall?

I would suggest a function pointer be passed into NPCallback::Create that gets
executed on the plugin thread that manages creation of all NPVariant types, 
something
like (and I'm thinking aloud here...)

void (*createCallbackArgs)(int argc,NPVariant[] outArgs,void *inArgs[]);

This would populate outArgs from the set of inArgs with managed code created by
codegen.py.

Does this sound sensible?  I think this should then work on all platforms that
suuport async callbacks?

Original comment by mb5p3nc...@googlemail.com on 18 Feb 2010 at 12:06

GoogleCodeExporter commented 9 years ago
This is exactly what I've been doing for a while now with no negative 
repercussions 
so far. 

I modified the nixysa glue code to create a copy of any callback arguments, 
allocated 
with the default "new" allocator. A pointer to a structure containing the 
copied 
arguments is passed to a function on the plugin thread that creates the 
NPObject's 
and executes the actual call. The last thing this function does in the plugin 
thread 
is to free the copied arguments. Works like a dream for me! 

I guess the main requirement is that any copied arguments are "deep" copies. 
There is 
no guarentee that pointers will still be valid when the plugin thread 
eventually gets 
around to calling the function.

Original comment by aaron...@gmail.com on 18 Feb 2010 at 2:47

GoogleCodeExporter commented 9 years ago
I agree on the pointers, but my assumption is that you pass ownership of 
pointers to
the callback functions.  The code I have written uses a boost::shared_ptr to 
manage
ownership of the pointer in the code I write to wrap the .idl implementation.

Does the patch you submitted take this approach (I have not looked at it yet, 
sorry)
and is there any chance we can change the status of this defect from WontFix as 
this
is a real showstopper for me!

Original comment by mb5p3nc...@googlemail.com on 18 Feb 2010 at 3:12

GoogleCodeExporter commented 9 years ago
I didn't want to introduce any ownership issues so the process for me is:

Private thread calls a nixysa callback function.
A struct is allocated and the arguments are copied here.
NPN_PluginThreadAsyncCall is called with userdata pointing at this struct and a 
pointer to a function that copies the arguments to NPObjects, executes the 
callback, 
then frees everything. 

So long as a pointer doesn't make it through the copy process, there's no real 
issues. That pointer problem though might be why my patch was rejected. I'm not 
sure... 

It was a show stopper for me too but I got sick of applying the patch after 
rebasing 
to the latest nixysa version so I just threw nixysa into my repository. The 
patch I 
posted on here is a little old I think. I'll produce a more recent version when 
I get 
a chance and post it here.

Original comment by aaron...@gmail.com on 18 Feb 2010 at 3:22

GoogleCodeExporter commented 9 years ago
I also have nixysa in my repository now (possibly for the same reasons!).

I will have a go at applying your patch as soon as I possibly can.

I still think this issue needs fixing

Original comment by mb5p3nc...@googlemail.com on 18 Feb 2010 at 4:12

GoogleCodeExporter commented 9 years ago
Ok, I am looking at rewriting the callback code to make it plugin thread safe, a
second try to get the code integrated into nixysa codebase?

I have a question:

Why is NPCallback derived from NPObject?  From my quick analysis of the code,
NPCallback is a helper class to manage the callback in an async mode of 
operation. 
So surely the instance is short lived until the callback has been executed.  I 
can
understand that the reference counting nature of NPObject is being used to 
ensure
that the object is deleted at the correct time, but by using the reference count
functions of NPObject, we are mandating construction of NPCallback objects on 
the
pluging thread, which seems to be fundamentally against the idea of the async 
helpers
in the first place?

If I refactor the code to make NPCallback a simple helper class with internal
reference counting (possibly using boost::shared_ptr if there are no 
objections),
would this cause major problems?

I think there is a relatively elegant solution here that will solve all of the
problems we are having.

Original comment by mb5p3nc...@googlemail.com on 23 Feb 2010 at 1:26

GoogleCodeExporter commented 9 years ago
I had the same question. Perhaps it came from the idea that callbacks would 
always be 
triggered from some other plugin call in the render thread so using the NPAPI 
allocator 
was somehow more correct? Not sure... 

Anyway, I have thrown together what I think is the complete patch. I copied 
some files 
from my distro to a clean SVN copy and manually merged. It needs testing 
though. Hope 
it helps. I'd love to see this added to nixysa.

Original comment by aaron...@gmail.com on 23 Feb 2010 at 1:44

Attachments:

GoogleCodeExporter commented 9 years ago
Quick review (from patch file, not applied to source yet).  It looks very 
similar to
the approach I am taking, the one issue I have is the NPCallback is derived from
NPObject still in your code.  I am not sure this is necessary as NPCallback is 
only
created on an async call in the current code.

Also there does not seem to be an implementation of NPCallback_DoAsyncCall in 
the patch.

Finally I think there is still the chance that NPN_ReleaseObject may be called 
on an
illegal thread?

Original comment by mb5p3nc...@googlemail.com on 23 Feb 2010 at 1:52

GoogleCodeExporter commented 9 years ago
Sorry. I had a merge conflict that I somehow messed up. Here's a version with 
NPCallback_DoAsyncCall.

I just did a quick visual check and I think you're right about not needing to 
derive 
from NPObject.

I can't see a scenario whereby NPN_ReleaseObject might be called from the wrong 
thread though. The NPCallback object can be created on any thread but will only 
be 
destroyed from the NPCallback_DoAsyncCall function.

Original comment by aaron...@gmail.com on 23 Feb 2010 at 3:15

Attachments:

GoogleCodeExporter commented 9 years ago
Ok, I have tried this patch and it sort of works.  If the callback has no args, 
it
works fine, if args are used in the callback, there are still calls to
glue::class_<Class name>::CreateNPObject
on the thread that is issuing the callback (ie not the plugin thread).  All 
object
creation needs to be moved to the plugin thread.
I will have a go at implementing this scheme on top of the patch you have 
submitted.
 It was what I was originally working towards anyway.  I take it that you are not
passing arguments in you callbacks then?
Should have something available soon.

Original comment by mb5p3nc...@googlemail.com on 23 Feb 2010 at 4:27

GoogleCodeExporter commented 9 years ago
Let me clarify a bit.  Firefox crashes if I have IDL generated types in the 
callback.
 This is due to the call to CreateNPObject (I think), if the type is inbuilt (ie a
string), the creation works fine.

Original comment by mb5p3nc...@googlemail.com on 23 Feb 2010 at 4:35

GoogleCodeExporter commented 9 years ago
Hmm.. I have it working with arguments which makes me think I did a bad job of 
building 
that patch. I'll have a better look it tomorrow. I just quickly threw that 
patch 
together. Apologies that its not clean. 

I haven't kept up to date with fixes and such since the patch was rejected. I 
basically 
just forked the codebase for my own needs at the time and have recently 
abandoned 
nixysa in place of the Juce framework. 

Original comment by aaron...@gmail.com on 23 Feb 2010 at 4:37

GoogleCodeExporter commented 9 years ago
NPCallback does not need to be derived from NPObject. I was just being lazy 
when I
originally wrote it. The problem is the reference counting on any arguments 
that have
object type. They need to be retained between the callback object being created 
and
the callback being invoked, otherwise the object arguments might be freed in the
interim. That's the problem I don't see an easy solution to.

Original comment by apatr...@google.com on 23 Feb 2010 at 7:15

GoogleCodeExporter commented 9 years ago
Isn't this an issue not only for callbacks but for any javascript-instantiated 
objects that a plugin keeps references to?

To address the issue of the browser freeing my resources, I ended up adding my 
own 
pointer-based reference-counted binding based on the Poco framework's 
Poco::RefCountedObject and Poco::AutoPtr classes. The NPObject claims only a 
single 
Poco reference. If I have other references to my objects within my plugin, they 
stay 
allocated even if the NPObject is destroyed.

Just my 0.02c but I think Nixysa would really benefit from something like this. 

Original comment by aaron...@gmail.com on 23 Feb 2010 at 9:13

GoogleCodeExporter commented 9 years ago
You are correct. One cannot use NPAPI reference counting (or any NPAPI function
except NPN_ThreadAsyncCall) on any kind of NPObject on threads other than the
original plugin thread. That's the problem I can't think of a solution to. Any 
object
typed arguments to the callback need to be retained in case the callback object 
ends
up having the one and only reference to them; there are no guarantees at all 
about
how long it will be until the callback is invoked.

It is certainly possible to avoid calling NPN_CreateObject when a callback 
object is
created because it doesn't need to be an NPObject. The issue is with the 
arguments.

Original comment by apatr...@google.com on 23 Feb 2010 at 10:44

GoogleCodeExporter commented 9 years ago
Ok, I have something that is working now.  But I want to understand the issues 
of 
ownership and NP resource management.  I have attached the patch, if you could 
review 
and let me know.  Its a bit messy at the moment, but I think makes for a 
sensible 
starting point.

These are the assumptions I have made.  They may not be correct, because my use 
cases 
are very constrained at the moment, so please put me right if any of the 
following 
assumptions are incorrect:

1, The callback is assumed to be created on the plugin thread - ie from running 
javascript code.  Therefore, the call to NPN_RetainObject within the NPCallback 
constructor in common.cc is valid.  If this is not true, the design is 
fundamentally 
broken.

2, The creation of NP args is deferred to a callback from 
NPN_PluginThreadAsyncCall.  
I think this is also fine as all args are passed by value.  This allows for use 
of 
copy constructors to ensure validity of data through the callback.  If complex 
types 
need to be passed, then they can be managed by pointer through a construct such 
as 
boost::shared_ptr.

3, The only time that NPN_PluginThreadAsyncCall is used is when the callback is 
flagged as async in the idl.  The only time I can see this as being needed is 
when 
events are being generated from non-plugin threads.  This would also mean that 
the 
arguments for the callback would not be NPObject types?  This would also mean 
that 
NPN_RetainObject would not need to be called on any of the arguments passed to 
the 
callback.

Original comment by mb5p3nc...@googlemail.com on 1 Mar 2010 at 9:51

Attachments:

GoogleCodeExporter commented 9 years ago
Has anyone had a chance to look at the patch?  I have been using this for a 
while now
and it seems to be holding up.

Original comment by mb5p3nc...@googlemail.com on 8 Mar 2010 at 9:23

GoogleCodeExporter commented 9 years ago
Sorry for being slow... The patch didn't take for me. I tried patching a few
different revisions of nixysa and no luck. Did you maybe change the original 
before
applying the patch? To get this checked in, you'll need to use our code review
system. The instructions are at 
http://code.google.com/p/nixysa/wiki/ContributingCode
and a link to the Google style guide is there too.

I see from eyeballing the diff file directly, you are deferring creating the the
arguments until the main thread invokes the callback. How are you managing the
lifetime of arguments prior to that happening? Something app specific? This will
hopefully be clearer once I can apply the patch or look at it in the code 
review tool.

Also, say you have a callback called FooCallback in your application, is it 
true that
you don't call FooCallback_glue::Run() off the main thread? That would lead to 
NPAPI
functions being called. How do you go about invoking async callbacks off the 
main thread?

Thanks

Al

Original comment by apatr...@google.com on 8 Mar 2010 at 6:14

GoogleCodeExporter commented 9 years ago
The start point of the patch may not have been clean, so I will check out the 
latest,
reapply and generate a new patch.

As for the quesions:

FooCallback_glue::Run is assumed to be called off the main plugin thread, it is 
a
wrapper to the RunCallback function with the async boolean set to true.  In the
static glue function NPCallback::Call, if the async bool is true, then it calls
NPN_PluginThreadAsyncCall, which will defer all callback arg generation (and
therefore call calls to NPN functions) until the async function is called.  If
'async' is false, NPN_InvokeDefault is called immediately, and my understanding 
is
that the NPN_InvokeDefault call itself increments the ref count on all NP 
arguments
passed to it.

I think that there is a missing use case (as you point out) of calling the async
callback from the main thread (ie from the browser context itself).  If we can 
detect
this, and I am sure it must be possible, the the FooCallback_flue::Run function 
api
will need to be augmented to indicate that it is being called from within the 
browser
context.  The only time reference counting would be a problem here is if we are
passing native npobjects in the argument list defined in the .idl file, but if 
this
were the case, the reference count should be incremented when the callback is
generated (and this will happen on the plugin thread), and so in the
FooCallback::Run() function, there should be no need to call NPN_RetainObject 
as the
callback wrapper would already hold a reference to the object?

I am not sure I have explained that very well, and I am sure that I don't 
understand
the problem space as well as you guys do, but my gut feeling is that the
implementation should be valid (with a buit of tweaking).

Matt

Original comment by mb5p3nc...@googlemail.com on 12 Mar 2010 at 10:16

GoogleCodeExporter commented 9 years ago
Ok, here is the patch again, cleanly applied to r67

Original comment by mb5p3nc...@googlemail.com on 12 Mar 2010 at 11:30

Attachments:

GoogleCodeExporter commented 9 years ago
why was this patch never applied to the mainline?

Original comment by and...@sideramota.com on 27 Jun 2011 at 4:25

GoogleCodeExporter commented 9 years ago
Not that I am aware of.  My gut feeling on this was that the owners of the 
project are only really interested in support for Chromium where this is not a 
problem.  I have not done any more work in this area for a long time now, but 
you are welcome to take the patch and modify it to work on the latest release 
of the project.

If you need any help, just ask - it may take me a little while to get back up 
to speed in this space, but I will help if I can.

Regards

Matt

Original comment by mb5p3nc...@googlemail.com on 15 Jul 2011 at 1:02