ytti / oxidized

Oxidized is a network device configuration backup tool. It's a RANCID replacement!
Apache License 2.0
2.8k stars 925 forks source link

Strange changes #468

Closed dandunckelman closed 7 years ago

dandunckelman commented 8 years ago

We're seeing unexpected changes recorded to our Gitlab project hosting the config of our Force10 switches. We're running a 5 minute interval, and here are the diffs:

diff0

On the next interval: diff1

As you can see, a change on the switch isn't being recorded...just a change of the config file's structure.

Please let me know if more info is needed. Thanks

ytti commented 8 years ago

It looks like race condition, which usually means we're not eating all prompts, so it's kind of chance which prompt we read to determine end of given command.

Usually this is caused by using send("blaah\n") in model, which does not expect prompt, but if prompt really does follow that, we don't eat it, and then cmd "show something" might terminate immediately, because it saw that prompt, and we might end up completely in de-sync.

https://github.com/ytti/oxidized/blob/master/lib/oxidized/model/ftos.rb#L41

This should be changed to cmd(), from send(). Try that and report back.

dandunckelman commented 8 years ago

Thanks @ytti

After making the change, I stopped oxidized and re-ran it. That produced this diff:

diff

Stopping and re-running led to no change as expected. I'll let it run for a while and report back if anything happens. But at first glance, that change seems to have fixed it.

dandunckelman commented 8 years ago

@ytti After 17 hours of no changes, it started to produce these changes:

diff1

then a few minutes later: diff0

and I'd wager it happens again in the next 24 hours.

Thanks for any help. Please let me know if there's something more I can do to help you.

ytti commented 8 years ago

Look at the diff with some editor which shows unprintables, like vim or something. I'm guessing entirely different issue, potentially sometimes it inserts ^M in the beginning of line or similar, which would be easy to filter out.

dandunckelman commented 8 years ago

I ran a diff locally and found no CR. The 2 lines are being added/removed repeatedly. Also, the diff in our Gitlab would show ^M if a carriage return was present.

As a note: our Gitlab server is configured with autocrlf set to input, so it doesn't auto-convert line-endings. Reference: https://git-scm.com/docs/git-config

Thanks for the help. Please let me know if there's something more I can try.

ytti commented 8 years ago

I think you should just remove \r characters in the model for the command where they may or may not appears. And send pull request.

dandunckelman commented 8 years ago

This is what I've added:

   cmd 'show inventory media' do |cfg|
+    cfg.gsub! /\r//, ''
     comment cfg
   end

I'll let that run a while to see if new changes occur.

I only added it there, since that's where the latest diffs occurred. Do you think I should add it elsewhere? Or do something different entirely?

ytti commented 8 years ago

I think that's probably fine. I'd lean towards being opportunist and fix issue when it happens.

dandunckelman commented 8 years ago

About 30 minutes ago, one of our 10 switches did the same thing.

I'm going to alter the interval to hourly (it's currently at 5 minutes).

Lemme know if you have any other suggestions.

ytti commented 8 years ago

Was this after your fix? In same command?

dandunckelman commented 8 years ago

Yes. We opted to not alter the interval for now. We're trying a few other changes.

I'll post after some time has passed.

ytti commented 8 years ago

I would first like to understand why the change didn't help. You still saw ^M in 'show inventory media' after removing \r from the text?

ytti commented 8 years ago

Do you now have both changes the \r removal and the fix for https://github.com/ytti/oxidized/blob/master/lib/oxidized/model/ftos.rb#L41

dandunckelman commented 8 years ago

I never saw ^M before/after the change.

And yes, I still have both changes.

ytti commented 8 years ago

Oh I misread you. What did you see then? If you see git reporting line change, then something has changed. It's probably unprintable character, so just view it with vim or something to see difference.

^M is \r which is carriage feed, I've seen it in few boxes being injected in start of line (incorrectly). Clearly this is different, but as you have the diffs, you should be able to see the actual bytes that differ.

dandunckelman commented 8 years ago

No worries. I repeatedly see two lines added/removed (not newline characters). These lines:

! 6615-S60-01#show inventory media
! Slot   Port     Type        Media              Serial Number        F10Qualified

I think your original suspicion could be right: this may be some sort of race condition.

ytti commented 8 years ago

Yeah the 'show inventory media' definitely shouldn't be visible. Is there chance we're seeing prompt somewhere in the data?

If cmd/send that you fixed is clearly causing prompt detection problem, but perhaps the prompt regex is too relaxed and is finding prompt somewhere in data.

dandunckelman commented 8 years ago

Makes sense. Here's the SSH log from one of the switches:

6615-S60-01>terminal length 0
6615-S60-01>terminal width 0
                     ^
% Error: Invalid input at "^" marker.
6615-S60-01>enable
Password: 
6615-S60-01#
6615-S60-01#show inventory
System Type            : S60        
System Mode            : 1.0       
Software Version       : 8.3.3.10

Unit Type               Serial Number  Part Number  Rev  Piece Part ID            Rev  Svc Tag  Exprs Svc Code
--------------------------------------------------------------------------------------------------------------
* 0  S60-01-GE-44T-AC-R SHFM124600920  7520058801   E    N/A                      N/A  N/A      N/A           
  0  S60-10GE-2SFP+     SFFM114300838  7520044401   07   MY-0R17GD-74880-143-0838 A00  N/A      N/A           
  0  S60-10GE-2SFP+     SFFM114300801  7520044401   07   MY-0R17GD-74880-143-0801 A00  N/A      N/A           
  0  S60-PWR-AC-R       ARFM120200781  7520047702   A    N/A                      N/A  N/A      N/A           
  0  S60-PWR-AC-R       ARFM124701034  7520059401   D    N/A                      N/A  N/A      N/A           

 * - Management Unit 

Software Protocol Configured 
--------------------------------------------------------------
  NTP 
  SNMP 
  LLDP 

6615-S60-01#show inventory media
Slot   Port     Type        Media               Serial Number        F10Qualified
------------------------------------------------------------------------------
   0     44              Media not present or accessible
   0     45              Media not present or accessible
   0     46              Media not present or accessible
   0     47              Media not present or accessible
   0     48     SFP+        10GBASE-SR          AQ11KM3                  Yes
   0     49     SFP+        10GBASE-SR          AQ11ML8                  Yes
   0     50     SFP+        10GBASE-SR          AQ11KVT                  Yes
   0     51     SFP+        10GBASE-SR          AQ11KLX                  Yes

6615-S60-01#show running-config
Current Configuration ...
! Version 8.3.3.10
! Last configuration change at Thu Jun  9 14:36:52 2016 by admin
! Startup-config last updated at Thu Jun  9 14:34:30 2016 by admin
!
...
REDACTED INFO
...
!
end
6615-S60-01#exit

After we saw this, we amended the model to not run terminal width 0.

dandunckelman commented 8 years ago

Please let me know if there's something more I can do to help answer your question

ytti commented 8 years ago

I don't think the terminal width 0can break it. If it has sufficiently often, maybe write each commands output to a file, to verify if the outputs actually match the commands output or if it's incorrect (prompt issue).

Something like File.write 'filename', cfg or maybe add timestamp to filename.

ytti commented 8 years ago

What's the status on this issue?

dandunckelman commented 8 years ago

Well, I couldn't get it to stop creating changes so I disabled the container (I'll update to the latest and restart it today).

One thing that helped a bit was switching the interval from 5 minutes to 1 hour. I was still getting changes but much less often (days apart instead of hours and in extreme cases < 1 hour).

I'll post back after having it run for a few days.

Thanks for pinging me on this.

Sincerely, Dan L. Dunckel

On Mon, Aug 29, 2016 at 8:43 AM, ytti notifications@github.com wrote:

What's the status on this issue?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ytti/oxidized/issues/468#issuecomment-243163101, or mute the thread https://github.com/notifications/unsubscribe-auth/ABGapriG1tg24D9AftnK9ckVhOOMhkG1ks5qkv4bgaJpZM4IzPVB .

laf commented 7 years ago

@dandunckelman How did you get on with this? Any further info on what you are seeing?

dandunckelman commented 7 years ago

@laf No. I gave up months ago.

I'll close for now.