Open nickbroon opened 6 years ago
Compiling with the following small diff to disable the InstanceManager it still crashes, just not in the aggregator hastable IM anymore. This time in deep in IpfixReceiverFile() buffer management code. Anyone able to offer any clues?
diff --git a/src/core/InstanceManager.h b/src/core/InstanceManager.h
index b0788ef14f9b..75b6df6a19c9 100644
--- a/src/core/InstanceManager.h
+++ b/src/core/InstanceManager.h
@@ -30,6 +30,8 @@
#include <list>
#include <algorithm>
+#define IM_DISABLE 1
+
using namespace std;
/**
./vermont -l debug -f ./vermont.xml
11:24:47.103[3] INFO /home/brownn/work/vermont/src/modules/ipfix/aggregator/FlowHashtable.cpp:709: Tried to set mask of length 4 IP address
11:24:47.103[3] DEBUG /home/brownn/work/vermont/src/modules/ipfix/aggregator/FlowHashtable.cpp:459: nhash=128727
11:24:47.103[3] DEBUG /home/brownn/work/vermont/src/modules/ipfix/aggregator/FlowHashtable.cpp:536: creating new bucket
11:24:47.103[3] INFO /home/brownn/work/vermont/src/core/InstanceManager.h:158: removing used instance 0x7fffe8013550
11:24:47.103[3] ERROR /home/brownn/work/vermont/src/modules/ipfix/IpfixParser.cpp:593: IpfixParser: Invalid set length 0, must be >= 4
*** Error in `/home/brownn/work/vermont/vermont': free(): invalid pointer: 0x00007fffe8013bf0 ***
Thread 4 "IpfixReceiver" received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff0bca700 (LWP 336)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007ffff5d09f5d in __GI_abort () at abort.c:90
#2 0x00007ffff5d5228d in __libc_message (action=action@entry=do_abort,
fmt=fmt@entry=0x7ffff5e79528 "*** Error in `%s': %s: 0x%s ***\n") at ../sysdeps/posix/libc_fatal.c:181
#3 0x00007ffff5d5964a in malloc_printerr (action=<optimised out>, str=0x7ffff5e75eae "free(): invalid pointer",
ptr=<optimised out>, ar_ptr=<optimised out>) at malloc.c:5426
#4 0x00007ffff5d5b73e in _int_free (av=0x7ffff60abc20 <main_arena>, p=<optimised out>, have_lock=0) at malloc.c:4175
#5 0x00007ffff5d6044e in __GI___libc_free (mem=<optimised out>) at malloc.c:3145
#6 0x00005555556e2f3b in boost::checked_array_delete<unsigned char> (x=0x7fffe8013bf0 "")
at /usr/include/boost/core/checked_delete.hpp:41
#7 0x00005555556e2d4a in boost::checked_array_deleter<unsigned char>::operator() (this=0x7fffe8023c18,
x=0x7fffe8013bf0 "") at /usr/include/boost/core/checked_delete.hpp:63
#8 0x00005555556e3309 in boost::detail::sp_counted_impl_pd<unsigned char*, boost::checked_array_deleter<unsigned char> >::dispose (this=0x7fffe8023c00) at /usr/include/boost/smart_ptr/detail/sp_counted_impl.hpp:153
#9 0x00005555556c1c4e in boost::detail::sp_counted_base::release (this=0x7fffe8023c00)
at /usr/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:146
#10 0x00005555556c1ccb in boost::detail::shared_count::~shared_count (this=0x7ffff0bc99d8, __in_chrg=<optimised out>)
at /usr/include/boost/smart_ptr/detail/shared_count.hpp:473
#11 0x00005555556e1f24 in boost::shared_array<unsigned char>::~shared_array (this=0x7ffff0bc99d0,
__in_chrg=<optimised out>) at /usr/include/boost/smart_ptr/shared_array.hpp:44
#12 0x000055555576b7d0 in boost::shared_array<unsigned char>::reset<unsigned char> (this=0x7ffff0bc9a40,
p=0x7fffe8023c30 "") at /usr/include/boost/smart_ptr/shared_array.hpp:178
#13 0x000055555576a288 in IpfixReceiverFile::run (this=0x555555b4bef0)
at /home/brownn/work/vermont/src/modules/ipfix/IpfixReceiverFile.cpp:208
#14 0x00005555556ea9dd in IpfixReceiver::threadWrapper (ipfixReceiver_=0x555555b4bef0)
at /home/brownn/work/vermont/src/modules/ipfix/IpfixReceiver.cpp:170
#15 0x00007ffff7bbd7fc in start_thread (arg=0x7ffff0bca700) at pthread_create.c:465
#16 0x00007ffff5de5b5f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
@nickbroon Can you provide a pcap file for a MWE?
The 'ipfix.dump0000000000.gz' attachment to my first comment should provide a MWE.
Oh, I overlooked that :) Thx!
With the above config (and #define IM_DISABLE 1
) after processing the first 2 fields in the first dataset encounter (in the second frame, which also contains a template) on the third field applicationId
it appears the FlowHashtable::copyData()
is overwriting the original IpfixSetHeader (I set a watch on set->id
and set->length
from IpfixParser::processDataSet()
), and almost certainly more of the original ipfix frame. This is clearly wrong, though I've yet to correctly determine exactly what FlowHashtable::copyData()
is doing incorrectly for applicationId
and what needs fixing. Any advise greatly received.
#0 0x00007ffff5d837a6 in ?? ()
#1 0x0000555555786707 in FlowHashtable::copyData (this=0x555555b553f0, dstFI=0x555555b56770, dst=0x7fffe4013b90 "\n\n\001\002\n\n\001\001", srcFI=0x7fffe4013918, src=0x7fffe4013c74 "", modifier=Rule::Field::AGGREGATE) at /home/user/vermont/src/modules/ipfix/aggregator/FlowHashtable.cpp:585
#2 0x0000555555787289 in FlowHashtable::aggregateDataRecord (this=0x555555b553f0, record=0x7fffe4013550) at /home/user/vermont/src/modules/ipfix/aggregator/FlowHashtable.cpp:700
#3 0x0000555555719a9b in IpfixAggregator::onDataRecord (this=0x555555b4c350, record=0x7fffe4013550) at /home/user/vermont/src/modules/ipfix/aggregator/IpfixAggregator.cpp:76
#4 0x00005555556fad8c in IpfixRecordDestination::receive (this=0x555555b4c4c8, ipfixRecord=0x7fffe4013550) at /home/user/vermont/src/modules/ipfix/IpfixRecordDestination.cpp:40
#5 0x0000555555681b27 in ConnectionSplitter<IpfixRecord*>::process (this=0x555555b4c1e0, packet=0x7fffe4013550) at /home/user/vermont/src/core/ConnectionSplitter.h:93
#6 0x000055555567d45d in ConnectionSplitter<IpfixRecord*>::receive (this=0x555555b4c1e0, packet=0x7fffe4013550) at /home/user/vermont/src/core/ConnectionSplitter.h:66
#7 0x0000555555682ad5 in Source<IpfixRecord*>::send (this=0x555555b564a0, t=0x7fffe4013550) at /home/user/vermont/src/core/Source.h:98
#8 0x00005555556f3d16 in IpfixCollector::send (this=0x555555b56440, ipfixRecord=0x7fffe4013550) at /home/user/vermont/src/modules/ipfix/IpfixCollector.cpp:102
#9 0x000055555577193c in IpfixParser::push (this=0x555555b56580, ipfixRecord=0x7fffe4013550) at /home/user/vermont/src/modules/ipfix/IpfixParser.cpp:771
#10 0x000055555576dc4f in IpfixParser::processDataSet (this=0x555555b56580, sourceId=..., message=..., set=0x7fffe4013c70, endOfMessage=0x7fffe4013cd4 "") at /home/user/vermont/src/modules/ipfix/IpfixParser.cpp:351
#11 0x000055555577006d in IpfixParser::processIpfixPacket (this=0x555555b56580, message=..., length=116, sourceId=...) at /home/user/vermont/src/modules/ipfix/IpfixParser.cpp:608
#12 0x0000555555770cf3 in IpfixParser::processPacket (this=0x555555b56580, message=..., length=116, sourceId=...) at /home/user/vermont/src/modules/ipfix/IpfixParser.cpp:676
#13 0x000055555576b199 in IpfixReceiverFile::run (this=0x555555b4bce0) at /home/user/vermont/src/modules/ipfix/IpfixReceiverFile.cpp:287
#14 0x00005555556ea9dd in IpfixReceiver::threadWrapper (ipfixReceiver_=0x555555b4bce0) at /home/user/vermont/src/modules/ipfix/IpfixReceiver.cpp:170
#15 0x00007ffff7bbd7fc in ?? ()
#16 0x0000000000000000 in ?? ()
Thread 3 "IpfixReceiver" hit Breakpoint 6, FlowHashtable::copyData (this=0x555555b553f0, dstFI=0x555555b56770, dst=0x7fffe4013b90 "\n\n\001\002\n\n\001\001\067:0 (", srcFI=0x7fffe4013918, src=0x7fffe4013c74 "\n\n\001\001\n\n\001\002\001", modifier=Rule::Field::AGGREGATE) at /home/user/vermont/src/modules/ipfix/aggregator/FlowHashtable.cpp:556
556 {
(gdb) p *dstFI
$11 = (TemplateInfo::FieldInfo *) 0x555555b56770
(gdb) p *dstFI
$12 = {type = {id = 95, enterprise = 0, length = 65535}, offset = 8, privDataOffset = 0, isVariableLength = true, basicListData = {semantic = 120 'x', fieldIe = 0x0}}
(gdb) p *srcFI
$13 = {type = {id = 95, enterprise = 0, length = 5}, offset = 69, privDataOffset = 0, isVariableLength = false, basicListData = {semantic = 0 '\000', fieldIe = 0x0}}
(gdb)
So the question I guess is about copying applicationId
which is id 95
. It appears the source length 5 and marked as not isVariableLength while the destination has length of 65535 and marked as isVariableLength.
Any suggestion as to what is going on here are welcome.
Strangely the fieldLength
being used by the Hashtable is only 8. This probably the length of the 2 flowKey
fields, destinationIPv4Address and sourceIPv4Address
, but no account of length of the nonFlowKey
field applicationId
has been taken.
#1 0x0000555555787289 in FlowHashtable::aggregateDataRecord (this=0x555555b553f0, record=0x7fffe4013550) at /home/user/vermont/src/modules/ipfix/aggregator/FlowHashtable.cpp:700
700 copyData(hfi, htdata.get(), tfi, data, fieldModifier[i]);
(gdb) p *record
$17 = {<IpfixRecord> = {_vptr.IpfixRecord = 0x555555a88d28 <vtable for IpfixDataRecord+16>, sourceID = {px = 0x7fffe40011a0, pn = {pi_ = 0x7fffe40011d0}}, threeByteIndicator = 255 '\377', variableLenData = 0x0, variableLenDataTotalBytes = 100, variableLenDataCurrBytes = 0}, <ManagedInstance<IpfixDataRecord>> = {_vptr.ManagedInstance = 0x555555a88d58 <vtable for IpfixDataRecord+64>, myInstanceManager = 0x555555a97c40 <IpfixParser::dataRecordIM>, referenceCount = 1, deletedByManager = false}, templateInfo = {px = 0x7fffe4013610, pn = {pi_ = 0x555555b563d0}}, dataLength = 96, message = {px = 0x7fffe4013c60 "", pn = {pi_ = 0x7fffe4013ac0}}, data = 0x7fffe4013c74 "\n\n\001\001\n\n\001\002\001"}
(gdb) p fieldLength
$18 = 8
It appears that vermont treats applicationId as an octetArray, and always treats octetArrays as variable length. (ie as having a length of 65535)
In BaseHashtable::createDataTemplate()
it appears fieldLength will not increased, as it's neither a basic field nor IPFIX_TYPEID_basicList
as this code snippets show. (There should probably be an extra else here with a warning/error here)
if (!fi->isVariableLength) {
fieldLength += fi->type.length;
}
// Variable length fields: Extract real length information
else if (fi->type == InformationElement::IeInfo(IPFIX_TYPEID_basicList, 0)) {
fi->basicListData.semantic = rf->semantic;
fi->basicListData.fieldIe = new InformationElement::IeInfo(rf->fieldIe);
// Length is one pointer, as we are storing data in dynamically allocated vector (i.e. pointer to vector)
fieldLength += sizeof(vector<void*>*);
}
So the question is can it be extended to handle variable length octetArrays?
It's worth noting that octetArrays fields need not always be variable length, and can be fixed length encoded using a size smaller than 65535 if they want. In the case of the sample data used here the record contains applicationID encoded with a fixed length of 5.
Even if internally vermont treats all octetArrays as variable length, should it be possible to copy into it fixed length encoded otcetArray data?
@nickbroon Thanks for taking the time to analyze this behavior!
I think we need to fix several things, in order to add support for applicationId
and avoid these bug in the future:
BaseHashtable::createDataTemplate()
for variable length IEs other than basicList
FlowHashtable::aggregateField()
FlowHashtable::copyData()
FlowHashtable::aggregateField()
but this variable length IE fails before reaching that check.@ogasser Thanks for the pointers. I'll use these to look at extending the support. First priority will be to induce a hard error when it attempts to parse any 'octetArray' or 'string' type field to avoid the crash. Then I'll look to have parse and aggregate octetArray (of which applicationId is one) and string type fields (they very similar, so should be treatable in the same way).
When you say "Use real length in BaseHashtable::createDataTemplate() for variable length IEs other than basicList" I'm not sure I fully understand. What is the real length for the Rule::Field in the case of a octetArray field like applicationId? In a aggregator rule there is no given length, so these are simply marked as 65535 in the Rule::Field. So I'm unsure how to allocate space for this field in hashtable. I suppose it could stored as pointer to dynamic memory, (like for the basicList) and then when actual flows are received for to be inserted into the hashtable, the length can be extracted from them and the memory allocated. (bearing in mind that the field in received flow might be fix length or variable length encode). Or am I missing understanding how rules deal with fields and their encoding and length?
You are completely right. Since we do not know how long variable length IEs will be, we store them dynamically. Therefore the length of that field is the length of one pointer, similar as for basicList.
As an alternative, instead of Vermont always treating fields of type IPFIX_TYPE_string or IPFIX_TYPE_octetArray as variable length encoded it could treat them fix length encoded, with either a hard coded length defined in the code (say 256), or allowing the length to be configured per field in the vermont config used to generate the aggregator rules template, and the field data truncated or zero extended to fill the encoded space.
Examples might be:
<nonFlowKey>
<ieName>interfaceName</ieName>
<length>10</length>
</nonFlowKey>
<nonFlowKey>
<ieName>applicationID</ieName>
<length>5</length>
</nonFlowKey>
<nonFlowKey>
<ieName>ipHeaderPacketSection</ieName>
<length>56</length>
</nonFlowKey>
I'll explore both options ,treating them as variable length encoded like basicList or configurable fixed length encode, and determine which is both best and easiest to implement. For a lot of typically short IPFIX_TYPE_string and IPFIX_TYPE_octetArray fields like applicationID and interfaceName collectors often expect to see fixed length encoded data.
While examining the code related to flow aggregation I'm not sure I understand how flowKey/nonFlowKey interacts with the aggregation. I would assume that any field configured as nonFlowKey should be aggregated, and those configured as flowKey not aggregated.
The FlowHashtable::aggregateFlow
function that performs the actual aggregation does not appear to take the key status of field into consideration, basing the choice to aggregate on the return value of isToBeAggregated()
which appears to use a fixed table of `type.id' to determine this.
flowKey/nonFlowKey appears to only be used in AggregatorBaseCfg::readNonFlowKeyRule
and AggregatorBaseCfg::readFlowKeyRule
to set ruleField->modifier = Rule::Field::AGGREGATE
or ruleField->modifier = Rule::Field::KEEP
and then ruleField->modifier
is used in FlowHashtable::copyData
while building a flow for consideration of inserting/aggregating into the hashtable, but AGGREGATE and KEEP are not treated any different.
Am I misunderstanding something? I simply don't see how flowKey/nonFlowKey configuration is effecting how flows are aggregated together when flow is found in the hash table.
You are correct, the flowKey
and nonFlowKey
config directives seem to have no effect as I see it in the code.
That's what I suspected. I'll do some testing to confirm this, but if it's the case it would seem to be a pretty serious bug in Vermont!
Yes, this is definitely not what users expect when configuring the Vermont.
Would this be something you'd have time to look into fixing?
Unfortunately not in the coming weeks :( If you find the time, however, I would make sure to discuss and review with you the proposed fixes.
Using the following very basic config containing an aggregator, and the attached basic ipfix dump of flows that contain the "applicationId (95)" field, vermont will crash, with the below backtrace in the aggregator hashtable. Anyone familiar with aggregator/hashtable code able to easily diagnose the cause of the crash? If the flow capture does not contain the applicationId (95) the aggregator does not crash, so perhaps it related to 'octetArray' field support, or that in this case the field has a length of 5?
vermont.xml.gz ipfix.dump0000000000.gz vermont.log.gz