Closed GoogleCodeExporter closed 8 years ago
Hi,
If you've already got use_dns(no) set, then the easy stuff is out. Is the
timing of the DNS requests such that it's exactly when the sync happens that
the DNS lookups occur, or can you pinpoint that it's when syslog-ng and/or ELSA
begin parsing them that the DNS lookups occur? One ugly way of narrowing it
down would be to find the pid of syslog-ng, then do
strace -p <pid> -e trace=network
as you do the rsync to see exactly what syslog-ng is doing. Keep in mind this
will print a ton of info to the screen, so if possible, use a small test file
to have chance to see what it's doing.
Once you confirm it's indeed syslog-ng doing the lookups, can you tell me what
version of syslog-ng you're on? Also, if possible, can you attach the
syslog-ng.conf file (with anything sensitive redacted, of course) to this
ticket?
Original comment by mchol...@gmail.com
on 15 Feb 2013 at 11:12
The requests start coming in in almost immediately as the rsync finishes - I
did have syslog-ng have a pretty small follow_freq() set on those logs so its
hard for me to say absolutely though. It did appear that with syslog-ng
disabled, or with d_elsa set to simply dump to a file rather than pass to Elsa
that the requests didn't show up - Perhaps the use_dns() in the option block is
somehow not applying further down the line.
I'll attach more detailed logs/confs and anything I can find out with the
strace asap, probably I will have to wait to Monday to get them
Original comment by neome...@gmail.com
on 17 Feb 2013 at 12:31
Attached syslog-ng config: Possibly worth noting there are several more web
logs, configured the same way other than the paths of course; also I was
changing follow freq. up and down at various points to see what effect it would
have on the amount of time it would take to become queryable in ELSA which
might have impacted the number of dns requests I suppose.
Going to see if I can turn up any further details via strace
Original comment by neome...@gmail.com
on 18 Feb 2013 at 3:15
Attachments:
Begin rsync, pretty much right away:
DNS queries in syslog:
Feb 18 10:34:24 uselsa01 rsyslogd-2177: imuxsock lost 12277 messages from pid
28593 due to rate-limiting
Feb 18 10:34:24 uselsa01 dnsmasq[28593]: query[A] . from 127.0.0.1
Feb 18 10:34:24 dnsmasq[28593]: last message repeated 199 times
Feb 18 10:34:24 usaelsa01 rsyslogd-2177: imuxsock begins to drop messages from
pid 28593 due to rate-limiting
Feb 18 10:35:01 usaelsa01 CRON[30704]: (root) CMD (perl
/usr/local/elsa/web/cron.pl -c /etc/elsa_web.conf > /dev/null 2>&1)
strace with the -e trace=network option for some reason doesn't give me
anything, general strace seems to be pretty much entirely file I/O.
Original comment by neome...@gmail.com
on 18 Feb 2013 at 3:51
Did a:
ps auxw | grep elsa.pl | awk '{print "-p " $2}' | xargs strace -e trace=network
Which shows up the requests:
[pid 29607] connect(58, {sa_family=AF_INET, sin_port=htons(53),
sin_addr=inet_addr("127.0.0.1")}, 16) = 0
[pid 29607] sendto(58, "\r5\1\0\0\1\0\0\0\0\0\0\0\0\1\0\1", 17, MSG_NOSIGNAL,
NULL, 0) = 17
[pid 29607] recvfrom(58, "\r5\201\205\0\1\0\0\0\0\0\0\0\0\1\0\1", 1024, 0,
{sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, [16])
= 17
[pid 29607] socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 58
Only many many more times. Not sure what to look for from here
Original comment by neome...@gmail.com
on 18 Feb 2013 at 3:56
Ah, that's very helpful! Ok, so it's definitely elsa.pl. I think I know where
it's doing it. If the hostname after parsing isn't an IP, then ELSA does a DNS
lookup (this is a property of Socket's inet_aton() function). So, can you run
syslog-ng with the d_elsa destination swapped for a test destination where it
just writes out to a file? I need to know what the third tab-delimited field
looks like, as my hypothesis is that it's just a single dot where the hostname
is expected to be. If that's the case, then it's an issue with the log
formatting as they come in. It might be a case where you need to add
"flags(expect-hostname)" to the "s_web09_site1" source. Also, you should
declare those sources above the main ELSA lines in an external .conf file, then
have them included by adding "LOCAL_SYSLOG_CONF="/path/to/syslog/conf/file"" in
/etc/elsa_vars.sh, then run sh install.sh node update again, and it will
automatically include your custom syslog-ng include file in the right place.
Your custom file should look like this:
source s_network {
# no-multi-line means parse message as-is, newlines included, to allow for multi-line messages
tcp(flags(no-multi-line));
udp(flags(no-multi-line));
file("/home/it/web09/site1.com-access.log" flags(no-parse) program_override("url") follow_freq(30) ) ;
file("/home/it/web09/site2.com-access.log" flags(no-parse) program_override("url") follow_freq(30) ) ;
file("/home/it/web09/site3.com-access.log" flags(no-parse) program_override("url") follow_freq(30) ) ;
};
That way, we override the default ELSA network source to include your custom
local sources. This might not fix it, but then everything will be in the right
spot. Fixing the DNS issue itself will be a matter of finding the right
combination of flags on these file sources, so it might mean removing
"no-parse" and adding "expect-hostname", or just adding "expect-hostname."
Original comment by mchol...@gmail.com
on 18 Feb 2013 at 11:15
Parsed line from output to file -
'1361284967 uselsa01 url 7 11.111.11.111 - - [19/Feb/2013:06:44:49 -0500] "GET
/uk/downloads.php HTTP/1.1" 200 12912 "http://www.google.com/" "Mozilla/5.0
(Windows NT 6.1; WOW64; rv:14.0) Gecko/20100101
Firefox/14.0.1" 92.105.24.254 200 12912 GET /uk/downloads.php http://www.goo
gle.com/ Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.0) Gecko/20100101
Firefox/14.0.1 -'
Seems that its not a . but also not an expected IP.
Tentatively -
Adding expect hostname seems not to change things
Adding expect hostname + removing no parse seems to have prevented the DNS
lookups; And parsing still looks good when running a search on the last
imported class=URL's!
I haven't rearranged the configs yet, not sure if that would change the impact
of the flags? Is the idea of structuring the config file that way just to
utilize the existing s_network rather than custom sources?
Thanks for the help !
Original comment by neome...@gmail.com
on 19 Feb 2013 at 3:26
Unfortunate retraction, missed some overlap, the changed flags do clear the dns
queries buy also strip out the nice parsing. Hmm, I'll try adjusting the config
structure I guess and see what happens
Original comment by neome...@gmail.com
on 19 Feb 2013 at 3:50
Ok, good, so we've basically identified the source of the problem, now we're
just narrowing the solution. The config structure is because an ELSA update
will overwrite the syslog-ng.conf, so in order to preserve your local
customizations, we need to have those in an external file. Can you show me
what the logs look like now when you have expect-hostname and remove no-parse,
but the patterns aren't parsing it correctly?
Original comment by mchol...@gmail.com
on 19 Feb 2013 at 4:31
Ahh understood.
Line with the flags switched:
'1361291836 uselsa01 url unknown - - [19/Feb/2013:07:23:53 -0500] "GET
/uk/index.php HTTP/1.1" 200 11820 "-" "Mozilla/4.0 (compatible; MSIE 7.0;
Windows NT 6.0; BTRS124342; SLCC1; .NET CLR 2.0.50727; InfoPath.2; .NET CLR
3.5.30729; .NET CLR 3.0.30618; .NET4.0C)" '
[Which in ELSA results with program=url class=NONE and host=ip of the ELSA
server being the only separated fields ]
Original comment by neome...@gmail.com
on 19 Feb 2013 at 4:40
Ok, the fields look out of order. It should go timestamp, host, program,
class, message. What does it look like in the web console? I'm thinking we
may need the no-parse included, but have to modify how the files are coming in.
Original comment by mchol...@gmail.com
on 19 Feb 2013 at 5:12
In the ELSA web frontend those lines look like
'- - [19/Feb/2013:06:48:33 -0500] "GET /uk/index.php HTTP/1.1" 200 11820 "-"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"
host=172.1.1.1 program=url class=NONE'
Original comment by neome...@gmail.com
on 19 Feb 2013 at 5:19
Ok, so there's no program given because no-parse is on (we can set one using
program_override() if necessary). What does the original log line look like?
Original comment by mchol...@gmail.com
on 19 Feb 2013 at 5:22
Sorry, getting a little convoluted between conf switches so just to clarify:
Those examples are of the lines with no-parse removed and expect-hostname set
instead. Also I have a program override(url) set as in the original conf.
So now within the custom s_network source which is included I have multiple
lines looking like:
file("/home/it/web09/site1.com-access.log" program_override("url")
flags(expect-hostname) ) ;
Which looks like the example in post #10 if dumped to a file, or post #12 if
looked at in the ELSA web console. The original log-lines look like:
111.11.11.11 - - [19/Feb/2013:12:32:48 -0500] "GET /uk/index.php HTTP/1.1" 200
31817 "http://google.com" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:18.0)
Gecko/20100101 Firefox/18.0"
If I add the no-parse flag back in, I get a properly parsed log in the ELSA
web console, but the DNS lookups trigger again
Original comment by neome...@gmail.com
on 19 Feb 2013 at 5:38
Ok, I think what's happening is that the r_host syslog-ng rewriter isn't
working on file-based sources, so host comes in as a null field. Then
Reader.pm line 395 does the inet_aton() on an empty value, which performs the
DNS request. I'd like you to try something for me, and if it works, I'll put
it in the source for good. Go to Reader.pm line 395 and replace it with this:
$line[FIELD_HOST] = $line[FIELD_HOST] ? inet_aton($line[FIELD_HOST])) :
2130706433;
and restart syslog-ng. If I'm right, then the DNS lookups should stop and the
host should show up as 127.0.0.1.
Original comment by mchol...@gmail.com
on 19 Feb 2013 at 6:54
Ok, I replaced that line and restarted syslog-ng, and ran with no-parse added
again.
The DNS lookups still came in, but now I have host=0.0.0.0 in the web console,
instead of the IP of the ELSA machine
Original comment by neome...@gmail.com
on 19 Feb 2013 at 7:28
Ok, let's stop guessing at what the value is there and have it log, stick this
in the line just above and let me know what the debug output is:
$self->log->debug("FIELD_HOST: " . $line[FIELD_HOST]);
If it turns out that $line[FIELD_HOST] really does evaluate to a literal "."
then I will be confused, because I don't see where in the message it would be
coming from. The fact that you're getting 0.0.0.0 as the host means that a
string, not an IP is being evaluated by inet_aton().
Original comment by mchol...@gmail.com
on 19 Feb 2013 at 10:15
It seems to be properly pulling the localhost IP at that point [I don't know
enough about Perl to know if it would have quotes or something in the debut
output were it a string, but at least its not just a dot]:
* DEBUG [2013/02/19 17:27:50] /usr/local/elsa/node/Reader.pm (396)
Reader::parse_line 10790 FIELD_HOST: 127.0.0.1
* DEBUG [2013/02/19 17:27:50] /usr/local/elsa/node/Reader.pm (396)
Reader::parse_line 10790 FIELD_HOST: 127.0.0.1
ad infinitum
Very strange
Original comment by neome...@gmail.com
on 19 Feb 2013 at 10:32
[deleted comment]
Some strategic commenting and uncommenting...I tried running and with both your
new line and 395 enabled one at a time to the same effect. I then tried
commenting out the inet_aton conversion a few lines down, and it seemed that
this killed the dns requests regardless of which line is used for the initial
inet_aton() , so I'm guessing this inet_aton is the source. I added a debug
statement here but all the IP addresses I'm seeing logged look normal
if ($line[FIELD_CLASS_ID] ne 1){ #skip default since there aren't any fields
# Convert any IP fields as necessary
foreach my $field_order (keys %{ $self->class_info->{field_conversions}->{ $line[FIELD_CLASS_ID] }->{'IPv4'} }){
$self->log->debug("IPFIELD: " . $line[$field_order]);
$line[$field_order] = unpack('N', inet_aton($line[$field_order]));
}
Maybe a change similar to your replacement line for the other inet_aton() but
here instead?
[deleted and reposted for grammar/clarity edit]
Original comment by neome...@gmail.com
on 20 Feb 2013 at 4:43
Ah, yes, that could certainly be it. Wherever inet_aton() gets called, there's
the potential for a DNS lookup if it doesn't receive a valid IP to lookup. I'm
still confused as to why you're seeing normal IP's in the debug lines but
seeing DNS lookups anyway. I've done some testing to confirm the only way to
do a lookup for "." is for an undefined value (an empty string) to be looked up
in inet_aton(). So, let's try to put in the conditional lookup at the second
lookup area, and that should take care of the lookup.
Original comment by mchol...@gmail.com
on 20 Feb 2013 at 5:47
With that loop changed to:
##########################################
# Convert any IP fields as necessary
foreach my $field_order (keys %{ $self->class_info->{field_conversions}->{ $line[FIELD_CLASS_ID] }->{'IPv4'} }){
$self->log->debug("IPFIELD: " . $line[$field_order]);
# $line[$field_order] = unpack('N', inet_aton($line[$field_order]));
$line[$field_order] = $line[$field_order] ? inet_aton($line[$field_order]) : 2130706433;
}
#############################
I seem to avoid the dns requests. Every source in that batch seems to be
0.0.0.0 , destination 127.0.0.1, and host 127.0.0.1.
I wonder whether there could be whitespace that I'm not seeing in one of the IP
fields causing the problem? Or are these strings already stripped/trimmed
earlier on?
Original comment by neome...@gmail.com
on 20 Feb 2013 at 7:41
Sorry, this is my bad, the line:
$line[$field_order] = $line[$field_order] ? inet_aton($line[$field_order]) :
2130706433;
Should be:
$line[$field_order] = $line[$field_order] ? unpack('N',
inet_aton($line[$field_order])) : 2130706433;
Original comment by mchol...@gmail.com
on 20 Feb 2013 at 10:35
Ahhh Ok, swapped it.
So far this seems to have worked!
Looking at a fresh batch of logs from the web console, host and destination
127.0.0.1 , correct srcip. I'll let it run and post back if anything crops back
up.
Thanks again for the help resolving this!
Original comment by neome...@gmail.com
on 21 Feb 2013 at 3:01
Comitted under rev 654.
Original comment by mchol...@gmail.com
on 21 Feb 2013 at 8:28
Original issue reported on code.google.com by
neome...@gmail.com
on 15 Feb 2013 at 9:55