stompgem / stomp

A ruby gem for sending and receiving messages from a Stomp protocol compliant message queue. Includes: failover logic, ssl support.
http://stomp.github.com
Apache License 2.0
152 stars 80 forks source link

Losing Connection #17

Closed solars closed 13 years ago

solars commented 13 years ago

Hey,

after switching to the latest stomp version, and ruby 1.9.2, I'm getting a lot of disconnects from my queue (openmq with stomp bridge).

The daemon is still running though (I'm using stomp with daemon-kit). What I wanted to ask for is, does anyone have a suggestion how I could debug this? if it's stomps fault, if I can reconnect somehow, and how to enable debug logging for this?

Might also be the fault of openmq, although I have another daemon, running on ruby 1.8.7

In the logs I only have the following: https://gist.github.com/c746181c75a454deaa5a

Thanks in advance for any pointers, Christoph

morellon commented 13 years ago

Hi Christoph,

Your log doesn't give us much information to help with the problem. It simply tells us that the stomp client disconnected from server. Have you tried adding a log as in the example provided in https://github.com/morellon/stomp/blob/master/examples/logexamp.rb ?

Can you also check your config to initialize stomp client in both ruby versions? Does the 1.8.7 version disconnect as well?

solars commented 13 years ago

Hi, sorry for the late reply but I didn't have time before

I right now have the following in my daemon (its a daemon-kit app) for logging: https://gist.github.com/b76bd1169080215e0079

shouldn't that be enough? I don't see any additional debug output...

morellon commented 13 years ago

Hi Solars, How does your DaemonKit is implemented? Does it respond to these methods: on_connecting, on_connected, on_connectfail, on_disconnect, on_miscerr ? The on_miscerr may be the most helpful for you. How long does you stomp client stay stale, without receiving any messages? Another thing you could to to debug this, is testing stomp with ruby 1.9 in another message broker for trying to discard openmq as part of the problem.

solars commented 13 years ago

again a snippet of the openmq log which I cannot really explain, did anything regarding the protocol change on the ruby stomp side?

https://gist.github.com/2eee4c3835b67771ab66

I can't get any logging information out of stomp ruby

morellon commented 13 years ago

Again, regarding the broken pipe error, the only thing I can tell by the log is that your client closed connection before the server being able to send a message. The "No subscription header specified" WARN followed by the error, might be the same issue in https://github.com/morellon/stomp/issues/15 And as far as I've noticed, we didn't change anything in the protocol specification.

How about the things I said in my previous comment?

solars commented 13 years ago

ah sorry, I didn't see the previous reply!

here is the complete daemon kit file, daemon kit basically just wraps the whole thing, at the bottom you can see the actual loop for stomp: https://gist.github.com/70f4cbe0377be44a3c08

regarding your questions:

but whats the reason I don't get any debug log from stomp?

Thanks a lot for your help and time! Christoph

morellon commented 13 years ago

For logging stuff inside stomp you should implement a logging object similar to this one: https://github.com/morellon/stomp/blob/master/examples/slogger.rb Otherwise, you won't get any log outputs from stomp itself.

Since message frequency is much more sparse in 1.8.7 and it works, let's forget about this. I was thinking it might be some keepalive issue, but doesn't seem the case.

What I meant is to try with other broker in a test environment. By the way, can you reproduce this issue in a test environment (same version of stomp, ruby and openmq)?

Which OS are you running the stomp client on?

solars commented 13 years ago

Ah, I missed this logging object, sorry I will add it this evening and collect output

I don't have a test env currently where I can reproduce it, but I could probably set up a machine for it. I think using a different broker only makes sense if this fails as well then... What I'm unsure about are these SEVERE messages in the openmq log though.

Both clients (1.8.7 and 1.9.2) are running on the same linux machine (debian stable)

solars commented 13 years ago

I've added the slogger.rb now:

but all I get in stdout is the normal daemon-kit output: https://gist.github.com/fc752a611de42925d239

[daemon-kit]: DaemonKit (0.1.8.1) booting in development mode [daemon-kit]: Setting up trap for USR1 [daemon-kit]: Setting up trap for USR2 [daemon-kit]: Setting up trap for HUP [daemon-kit]: Setting up trap for INT [daemon-kit]: Setting up trap for TERM [daemon-kit]: DaemonKit (0.1.8.1) booted, now running HistoryDaemon [daemon-kit]: Setting up trap for KILL Logger initialization complete. HistoryDaemon-daemon.rb:42: Processing OTA_HotelRatePlanNotifRQ for WP_HOLGER from 2011-09-12T15:58:40Z} HistoryDaemon.rb:54: Processing OTA_HotelRatePlanNotifRQ for WP_HOLGER from 2011-09-12T15:58:40Z} HistoryDaemon-daemon.rb:44: Update processed HistoryDaemon-daemon.rb:42: Processing OTA_HotelResNotifRQ for DEMO_HOLGER1977 from 2011-09-12T16:01:11Z} HistoryDaemon.rb:54: Processing OTA_HotelResNotifRQ for DEMO_HOLGER1977 from 2011-09-12T16:01:11Z} HistoryDaemon-daemon.rb:44: Update processed

No stomp logging except for the init? Why is that?

morellon commented 13 years ago

Well, "Logger initialization complete" is in info level, the rest is in debug level. It may be related to that, but supposing you didn't change this logger initializer it should be logging debug level. Another thing that might be happening is that your daemon is capturing stdout output in some way. Can you try logging it to a file instead?

solars commented 13 years ago

I've added a logger.debug statement in the initializer which works... however, I've added file output and puts etc to the methods, and it seems that they are not called - e.g. connecting, but I assume with my given code/client they should be?

solars commented 13 years ago

it seems as if after this exception:

SEVERE: ProtocolChain exception java.lang.IllegalArgumentException at java.nio.Buffer.position(Buffer.java:218) at com.sun.messaging.bridge.service.stomp.StompProtocolParser.releaseBuffer(StompProtocolParser.java:380) at com.sun.grizzly.filter.ParserProtocolFilter.postExecute(ParserProtocolFilter.java:187) at com.sun.grizzly.DefaultProtocolChain.postExecuteProtocolFilter(DefaultProtocolChain.java:164) at com.sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:103) at com.sun.grizzly.DefaultProtocolChain.execute(DefaultProtocolChain.java:88) at com.sun.grizzly.ProtocolChainContextTask.doCall(ProtocolChainContextTask.java:53) at com.sun.grizzly.SelectionKeyContextTask.call(SelectionKeyContextTask.java:57) at com.sun.grizzly.ContextTask.run(ContextTask.java:69) at com.sun.grizzly.util.FixedThreadPool$BasicWorker.dowork(FixedThreadPool.java:379) at com.sun.grizzly.util.FixedThreadPool$BasicWorker.run(FixedThreadPool.java:360) at java.lang.Thread.run(Thread.java:662)

the messages are not acknowledge anymore - but processed...

solars commented 13 years ago

another thing - if I use the client in this block mode, and have client.join if I stop the daemon-kit - should I handle the queue client additionally? or does it work if its stopped like that?

morellon commented 13 years ago

Hi solars. I don't have any more ideas to help you any further. Please consider running a test environment for reproducing this issue with openmq and with another broker as I said before.

When stopping your daemon, always remember to call disconnect (or close in the client class).

gmallard commented 13 years ago

solars - if you have a small test case:

let me know.

I will offer to open up a firewall briefly, and give you another server to test against.

I will say that given the Java stack trace above, my hunch is you have a problem with the Stomp bridge that openmq uses.

solars commented 13 years ago

gmallard, thanks for your offer! The problem is still that I cannot reproduce it 100% sure

the main problem for me right now is that I cannot get any kind of output from stomp, the logging simply does not work, and seeing what is sent back and forth would be very helpful I've used this: https://gist.github.com/b886f94d0f6a0dc1dc0e

getting only the output in the initializer... then nothing more

is there any other place I can hook in for logs?

solars commented 13 years ago

I think its the openmq stomp bridge causing the problems, it seems that the acks are received on the server side

However, why am I still getting:

WARNING: [BSS2006]: No subscription header specified, ACK first non-transacted subscriptions ids with prefix /subscription-to/ for message ID:59850-178.63.244.192(da:1d:a4:97:4:e4)-44087-1316075722499

if I use: client.acknowledge(msg, "subscription" => msg.headers["subscription"]) and msg.headers.inspect contains: {"subscription"=>"/subscription-to//queue/switchhistory", "destination"=>"/queue/switchhistory", "message-id"=>"ID:59850-178.63.244.192(da:1d:a4:97:4:e4)-44087-1316075722499", "expires"=>"0", "redelivered"=>"true", "priority"=>"4", "timestamp"=>"1316075722499", "content-length"=>"418306"}

gmallard commented 13 years ago

The logger is implemented to show connection events, disconnect events, and physical socket errors. It does not show details of messages sent and received.

You could modify the gem to do that. You can hook in anywhere you want. Look at how the logger callbacks are implemented in connection.rb.

Of course it is the bridge that is causing this. The Java stack trace above tells me that.

You should not be using a 'subscription' header on the ACK. Only brokers use that header on MESSAGE frames. A quote from the STOMP spec:

If an id header is supplied (on SUBSCRIBE) then Stomp brokers should 
append a subscription header to any MESSAGE commands which are sent 
to the client ....

openmq appears to be way out of spec to me.

gmallard commented 13 years ago

Was that 'inspect' after you did the 'acknowledge' ? Show me an 'inspect' before the 'acknowledge'.

gmallard commented 13 years ago

I want to run an experiment.

Back this gem down to 1.1.8 and retest please.

solars commented 13 years ago

gmallard, I will do so at the weekend, I'm a bit busy right now

another question in the meantime: how can I handle ERROR messages from the stomp protocol with ruby stomp?

solars commented 13 years ago

before and after acknowledge inspects: https://gist.github.com/c881d124c5cda56c7444

solars commented 13 years ago

I switched it to 1.1.8 now, will report if I notice any differences... why do you think it helps?

btw. on this other client which did not have problems so far, but far less messages per hour, it is also 1.1.8

gmallard commented 13 years ago

1) You handle ERROR messages ( from the server ) as in any STOMP client. Your code recognizes them and deals with them properly. This is not the purpose or business of the Ruby client gem. Or any other language's client library for that matter. We just pass the ERROR messages to you, the client. You deal with them. They are generated by the server not the client. Read the spec.

2) Your before and after ack tells me nothing. I want to see the ACK right before it goes on the wire. What you show me is wrong for 1.1.9. It should have a 'id' header with a SHA1 as the 'id' value.

3) I do not think 1.1.8 helps. Why do you think that? I never said anything like that. I said I want to run an experiment.

gmallard commented 13 years ago

Please update your profile on github to have proper http server links.

http://tachium.at

does not respond. 'tachium.at' resolves with 'nslookup' but there is no web server up there, and I get 'connection refused'' for port 80. Drop your firewall on that port.

Same with 'textmode.at'

Also, make sure you private e-mail address is correct.

gmallard commented 13 years ago

Experiment #2:

Back out all of the modifications and whatever fiddling around you have on your github project. Revert to a pure 1.1.9 gem as we deliver it.

Give me all relevant logs:

a) client side b) server side

solars commented 13 years ago

I removed the webpages from my profile, its not related to any firewalls or similar. Email address is correct, or did you encounter something different? Not relevant in this context here though I guess.

1) What I meant if 'ERROR' is set somewhere in the message headers as I couldn't find it. I don't know the details of this protocol and I'm usually not interested if I need to get things done, that's why I use a third party library - so I don't know this by heart. It's clear that I have to handle errors myself, I just meant if ERROR is set somewhere as opposed to message so I can detect them

2) It was 1.1.8 as you asked for, can you please tell me where to hook in explicitly? With inspect as written above? Then I will add that. What should I dump, the message headers?

3) I was just asking for the reason for this experiment, out of interest to learn about the differences.

I am using the gem in this project here all the time, I am not using any other code - and I didn't mention I am doing so, what makes you think that? I switched back to 1.1.9 of this repository, before it was 1.1.8.

What logs do you need exactly? On the client side the only thing I can get is the connection log that slogger.rb provides,or the above inspects if you tell me what exactly is interesting for you

On the server side its the bridge logs - are these enough together with the inspects or do you want it on the protocol level with ngrep?

gmallard commented 13 years ago

Read the specification please:

http://stomp.github.com/stomp-specification-1.0.html

it is short. These conversations are approaching the size of the spec.

You wound not see error indicators in MESSAGE headers. That is not how the protocol works. Servers may return ERROR frames rather than MESSAGE frames. ERROR frames are server specific. Consult your server documentation for what they are.

You can detect them in code:

client.subscribe(config.queue.name, :ack => 'client') do |msg|  
    if msg.command == "ERROR"
        raise "kaboom: #{msg.inspect}"
    end
    # Normal process here
end

The request for you to try 1.1.8 is an experiment. You said you started having connection difficulties after a 1.1.9 upgrade. Does reverting to 1.1.8 cure the connection problems, yes or no?

Do you have an open problem ticket with the stomp bridge team? If not, I advise you to open one.

Bridge logs and stdout/stderr from your client.

solars commented 13 years ago

I think I will upgrade openmq, and continue this on the openmq mailing list if it still shows up

Thanks a lot for your help!

gmallard commented 13 years ago

I think that is a wise decision because I believe that is where the real problem is.

I will make some changes to the logger API as a result of this discussion, adding a couple of much more dangerous callbacks for transmit and receive. With a strong notation that they should be used only for debugging purposes, and not in production.