Closed ceiche59 closed 3 years ago
The default codec is "json". the files starting with "<" character is typically XML, for any type of format the codec probably has to be set to "plain". codecs is the only mandatory parameter for all input plugins, so it has to be set. It's probably best to separate your pipeline in a way that you ingest XML separate from JSON, you can use the path_filters option for that ...
if one pipeline uses the codec json you can process with ['**/*.json']
only all files with the json extension regardless of the path it's in. You can then use for instance ['**/*.xml','**/*.pdf']
to process xml files and pdf files
ruby has a begin/try/end possibility so that the error can be caught, but I'm not sure it's a good idea to parse xml as json and then figure out it doesn't process and then catch the error and then ignore it. It's probably better to filter it as soon as you read the filelist.
Thanks for the quick response - I though (from the readme) the only allowed codec was "line" or "json". Setting it to plain solves my issue.
I'm using the Azure-blob-storage input plugin to process files other than JSON and line base log-files. Actually most of the Files are XML (also some binary files like TIFF or PDF). I'm using XML Filter to parse my data and I'm identifying other file type (and process only the filename and type). While my Filter and output pipeline works well the input plugin logs an error for every input file like
[2021-06-17T13:33:31,890][ERROR][logstash.codecs.json ][main][19f37e5f946e8210f25a29fbea722302abeea634fd55cb2369731ed301ea1741] JSON parse error, original data now in message field {:error=>#<LogStash::Json::ParserError: Unexpected character ('<' (code 60)): expected a valid value (number, String, array, object, 'true', 'false' or 'null') at [Source: (String)" ....
Can we add functionality to allow any file format (or simply a method to surpress the error)
Thanks
Claus