Closed suyograo closed 8 years ago
Just a thought. Do your files end in a newline?
I saw something before where the XML files did not end in a newline and so Logstash Forwarder, Log Courier and Logstash all ignored it - they would see it as "unfinished write" and wait for the newline to appear.
There is a problem with charset handling. I have had the problem when using NxLog, lines end with \r\n and the codec splits lines using only \n.
I modified the code to use BufferedTokenzer as in the line codec (https://github.com/elastic/logstash/blob/master/lib/logstash/util/buftok.rb).
I'll send the code tomorrow.
@doubret are you sure that is related here? It would just result in events with dangling \r and I don't see how that would affect OP's problem? Maybe needs a separate issue?
The \r is probably stripped somewhere in the pipeline, i need to check. In the codec the event doesnt exist yet, it's just raw buffers.
Le 19 mars 2015 à 07:46, Jason Woods notifications@github.com a écrit :
@doubret are you sure that is related here? It would just result in events with dangling \r and I don't see how that would affect OP's problem? Maybe needs a separate issue?
— Reply to this email directly or view it on GitHub.
The OP problem is fixed with the inclusion of auto flush
(v2.0.9). Or possibly close_older
if using the file input. (v2.2.1)
I am including an explanation for readers from the future.
When trying to match multiline records that have a distinct begin and end pattern its better to negate match on the begin pattern as the OP has done. For the XML in the OP gist:
<?xml version="1.0"?>
<taxfile id="1692376550">
...
</taxfile>
codec => multiline {
pattern => "^<\?xml .*\?>"
negate => true
what => "previous"
}
If there is one or more XML documents per file, without auto_flush or close_older (file input), all the lines of the last XML document are buffered and because a new matching line never arrives an event is never emitted until LS is stopped.
This is why we brought in auto_flush
.
But let me explain the file input behaviour first. The introduction of close_older
allows the user to set a value in seconds, e.g. 10 seconds. When reading a file it is opened and read, we track the time of each read - so 10 seconds later, the file input closes the file and flushes its codec - which generates the event from the last XML doc stuck in the buffer. If new content is seen in the file, it is reprocessed from the last read position. There is no need to use auto_flush
in this case. The two cases are file tailing and file reading.
However if multiline codec is used with a different input then one would use the auto_flush config setting. It will do the same flush as before but in this case the multiline codec is tracking when the last line was buffered. If no more lines are seen for the auto_flush_interval
seconds duration then the lines are flushed and a event is generated.
Hi All,
I am using Logstash 5.2, and facing the same problem. The last line of the xml file never gets read hence producing the xml parsing error when using xml filter plugin. I did not understand from the previous comment if this problem is fixed or not.
Input XML file:
<?xml version="1.0"?>
<taxfile id="1692376550">
...
</taxfile>
Logstash conf:
file {
path => "C:/Input/*.xml"
start_position => beginning
codec => multiline {
pattern => "^<\?xml .*\?>"
negate => true
what => "previous"
auto_flush_interval => 1
}
type => "xml_type"
}
filter {
xml {
source => "message"
target => "content"
}
}
I have a couple of questions,
Could you please let me know, if reading an XML file (proper file without \n at the last line) and parsing with XML filter possible or not?
I know that adding a new line manually at the end of the file makes it fully readable by the logstash. But in my use case, I can't afford to update the file manually. So, is there a way logstash can add a newline at the end of the file before reading it as multiline?
Thanks Razik
This is not a bug in the multiline codec but it seems to be - the file input has buffered the last piece of text while waiting for the newline so the multiline codec never receives it.
Please use filebeat to solve this. See the close_eof option. https://www.elastic.co/guide/en/beats/filebeat/current/index.html
You can still use the xml filter to decode the XML.
Hi,guyboertje, I am facing the same problem as razik29, here is my configure:
codec => multiline{
pattern => "^%{TIME}"
negate => true
what => "previous"
auto_flush_interval => 1
}
Reading data from Kafka or stdin,either the last log will be missing. Event I append a new blank line after the last log. It seems auto_flush_interval does not work.
In product env, logstash will get data from kafka, how can I solve this problem?
My Logstash version is 5.2.2
@ragingCow Please note that the multiline codec should only be used for inputs that supply data that is line orientated. The kafka input build events directly from the JSON received so the multiline codec does not work for the kafka input.
@guyboertje Get it and thanks
From JIRA https://logstash.jira.com/browse/LOGSTASH-2124
Seems like a similar issue was fixed in multiline codec by @colinsurprenant
I am trying to ingest many xml files. I am using the file input with the multiline codec to group all lines of the xml file into one log entry and then filter using the xml filter. The filter fails to parse the xml. This is due to the fact that the very last line of the file is missing in the message field (the closing xml tag). Below are some gists to show the issue. config: https://gist.github.com/clay584/10754518 sample input file: https://gist.github.com/clay584/10754389 sample output that is missing the last line of the file: https://gist.github.com/clay584/10754659 As you can see, the last line which contains , is missing. This last line does not show up in any previous or future log entries. It's just gone.