Closed dandunckelman closed 7 years ago
It looks like race condition, which usually means we're not eating all prompts, so it's kind of chance which prompt we read to determine end of given command.
Usually this is caused by using send("blaah\n")
in model, which does not expect prompt, but if prompt really does follow that, we don't eat it, and then cmd "show something"
might terminate immediately, because it saw that prompt, and we might end up completely in de-sync.
https://github.com/ytti/oxidized/blob/master/lib/oxidized/model/ftos.rb#L41
This should be changed to cmd(), from send(). Try that and report back.
Thanks @ytti
After making the change, I stopped oxidized and re-ran it. That produced this diff:
Stopping and re-running led to no change as expected. I'll let it run for a while and report back if anything happens. But at first glance, that change seems to have fixed it.
@ytti After 17 hours of no changes, it started to produce these changes:
then a few minutes later:
and I'd wager it happens again in the next 24 hours.
Thanks for any help. Please let me know if there's something more I can do to help you.
Look at the diff with some editor which shows unprintables, like vim or something. I'm guessing entirely different issue, potentially sometimes it inserts ^M in the beginning of line or similar, which would be easy to filter out.
I ran a diff locally and found no CR. The 2 lines are being added/removed repeatedly. Also, the diff in our Gitlab would show ^M if a carriage return was present.
As a note: our Gitlab server is configured with autocrlf set to input, so it doesn't auto-convert line-endings. Reference: https://git-scm.com/docs/git-config
Thanks for the help. Please let me know if there's something more I can try.
I think you should just remove \r characters in the model for the command where they may or may not appears. And send pull request.
This is what I've added:
cmd 'show inventory media' do |cfg|
+ cfg.gsub! /\r//, ''
comment cfg
end
I'll let that run a while to see if new changes occur.
I only added it there, since that's where the latest diffs occurred. Do you think I should add it elsewhere? Or do something different entirely?
I think that's probably fine. I'd lean towards being opportunist and fix issue when it happens.
About 30 minutes ago, one of our 10 switches did the same thing.
I'm going to alter the interval to hourly (it's currently at 5 minutes).
Lemme know if you have any other suggestions.
Was this after your fix? In same command?
Yes. We opted to not alter the interval for now. We're trying a few other changes.
I'll post after some time has passed.
I would first like to understand why the change didn't help. You still saw ^M in 'show inventory media' after removing \r from the text?
Do you now have both changes the \r removal and the fix for https://github.com/ytti/oxidized/blob/master/lib/oxidized/model/ftos.rb#L41
I never saw ^M before/after the change.
And yes, I still have both changes.
Oh I misread you. What did you see then? If you see git reporting line change, then something has changed. It's probably unprintable character, so just view it with vim or something to see difference.
^M is \r which is carriage feed, I've seen it in few boxes being injected in start of line (incorrectly). Clearly this is different, but as you have the diffs, you should be able to see the actual bytes that differ.
No worries. I repeatedly see two lines added/removed (not newline characters). These lines:
! 6615-S60-01#show inventory media
! Slot Port Type Media Serial Number F10Qualified
I think your original suspicion could be right: this may be some sort of race condition.
Yeah the 'show inventory media' definitely shouldn't be visible. Is there chance we're seeing prompt somewhere in the data?
If cmd/send that you fixed is clearly causing prompt detection problem, but perhaps the prompt regex is too relaxed and is finding prompt somewhere in data.
Makes sense. Here's the SSH log from one of the switches:
6615-S60-01>terminal length 0
6615-S60-01>terminal width 0
^
% Error: Invalid input at "^" marker.
6615-S60-01>enable
Password:
6615-S60-01#
6615-S60-01#show inventory
System Type : S60
System Mode : 1.0
Software Version : 8.3.3.10
Unit Type Serial Number Part Number Rev Piece Part ID Rev Svc Tag Exprs Svc Code
--------------------------------------------------------------------------------------------------------------
* 0 S60-01-GE-44T-AC-R SHFM124600920 7520058801 E N/A N/A N/A N/A
0 S60-10GE-2SFP+ SFFM114300838 7520044401 07 MY-0R17GD-74880-143-0838 A00 N/A N/A
0 S60-10GE-2SFP+ SFFM114300801 7520044401 07 MY-0R17GD-74880-143-0801 A00 N/A N/A
0 S60-PWR-AC-R ARFM120200781 7520047702 A N/A N/A N/A N/A
0 S60-PWR-AC-R ARFM124701034 7520059401 D N/A N/A N/A N/A
* - Management Unit
Software Protocol Configured
--------------------------------------------------------------
NTP
SNMP
LLDP
6615-S60-01#show inventory media
Slot Port Type Media Serial Number F10Qualified
------------------------------------------------------------------------------
0 44 Media not present or accessible
0 45 Media not present or accessible
0 46 Media not present or accessible
0 47 Media not present or accessible
0 48 SFP+ 10GBASE-SR AQ11KM3 Yes
0 49 SFP+ 10GBASE-SR AQ11ML8 Yes
0 50 SFP+ 10GBASE-SR AQ11KVT Yes
0 51 SFP+ 10GBASE-SR AQ11KLX Yes
6615-S60-01#show running-config
Current Configuration ...
! Version 8.3.3.10
! Last configuration change at Thu Jun 9 14:36:52 2016 by admin
! Startup-config last updated at Thu Jun 9 14:34:30 2016 by admin
!
...
REDACTED INFO
...
!
end
6615-S60-01#exit
After we saw this, we amended the model to not run terminal width 0
.
Please let me know if there's something more I can do to help answer your question
I don't think the terminal width 0
can break it. If it has sufficiently often, maybe write each commands output to a file, to verify if the outputs actually match the commands output or if it's incorrect (prompt issue).
Something like File.write 'filename', cfg
or maybe add timestamp to filename.
What's the status on this issue?
Well, I couldn't get it to stop creating changes so I disabled the container (I'll update to the latest and restart it today).
One thing that helped a bit was switching the interval from 5 minutes to 1 hour. I was still getting changes but much less often (days apart instead of hours and in extreme cases < 1 hour).
I'll post back after having it run for a few days.
Thanks for pinging me on this.
Sincerely, Dan L. Dunckel
On Mon, Aug 29, 2016 at 8:43 AM, ytti notifications@github.com wrote:
What's the status on this issue?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ytti/oxidized/issues/468#issuecomment-243163101, or mute the thread https://github.com/notifications/unsubscribe-auth/ABGapriG1tg24D9AftnK9ckVhOOMhkG1ks5qkv4bgaJpZM4IzPVB .
@dandunckelman How did you get on with this? Any further info on what you are seeing?
@laf No. I gave up months ago.
I'll close for now.
We're seeing unexpected changes recorded to our Gitlab project hosting the config of our Force10 switches. We're running a 5 minute interval, and here are the diffs:
On the next interval:
As you can see, a change on the switch isn't being recorded...just a change of the config file's structure.
Please let me know if more info is needed. Thanks