Open fgalan opened 7 years ago
It seems there are three cases in which our code can declare a regex invalid: https://github.com/telefonicaid/fiware-orion/search?l=C%2B%2B&q=%22Invalid+regex+for+entity+id+pattern%22&type=Code&utf8=%E2%9C%93. They should be analyzed to check if everything is ok with them.
Note that there are .test that could cover some of them (see cases/2257_crash_with_invalid_subscription_when_matching/crash_with_invalid_subscription_when_matching.test). However, maybe there are covering "basic" cases and not the pre-conditions that lead to the bug.
In fact, that pre-conditions could be even hardly reproductible using .test so first step is to think about possible pre-conditions that may cause the logic to fail/crash just looking the code.
Actually, regcomp() is called in 10 different places:
./src/lib/cache/subCache.cpp: if (regcomp(&entityIdPattern, _entityId.c_str(), REG_EXTENDED) != 0)
./src/lib/cache/subCache.cpp: if (regcomp(&entityTypePattern, _entityType.c_str(), REG_EXTENDED) != 0)
./src/lib/rest/StringFilter.cpp: if (regcomp(&patternValue, stringValue.c_str(), REG_EXTENDED) != 0)
./src/lib/rest/StringFilter.cpp: if (regcomp(&patternValue, stringValue.c_str(), REG_EXTENDED) != 0)
./src/lib/jsonParseV2/parseEntityObject.cpp: if (regcomp(&re, iter->value.GetString(), REG_EXTENDED) != 0)
./src/lib/jsonParseV2/parseEntityObject.cpp: if (regcomp(&re, iter->value.GetString(), REG_EXTENDED) != 0)
./src/lib/jsonParseV2/parseSubscription.cpp: if (regcomp(&re, idPattern.c_str(), REG_EXTENDED) != 0)
./src/lib/jsonParseV2/parseSubscription.cpp: if (regcomp(&re, typePattern.c_str(), REG_EXTENDED) != 0)
./src/lib/ngsi/EntityId.cpp: if (regcomp(&re, id.c_str(), REG_EXTENDED) != 0)
./src/lib/mongoBackend/MongoGlobal.cpp: if (regcomp(®ex, en2->id.c_str(), REG_EXTENDED) != 0)
I just don't think it is worth the effort to check these calls for crashes. If we have seen the text "invalid regular expression for idPattern" in the log file, doesn't that mean exactly that the if (regcomp() != 0) worked just fine?
Looking at one of the usages of regcomp (MongoGlobal.cpp):
if (regcomp(®ex, en2->id.c_str(), REG_EXTENDED) != 0)
{
std::string details = std::string("error compiling regex for id: '") + en2->id + "'";
alarmMgr.badInput(clientIp, details);
}
else
{
idMatch = (regexec(®ex, en1->id.c_str(), 0, NULL, 0) == 0);
}
regfree(®ex);
If regcomp
fails, we call regfree
... ?
I do not know, but that might be dangerous, depending on how regcomp failed.
And not calling regfree
might give a leak for invalid regexp ...
Need to look into this, in all 10 places, to see what is done about regfree
.
Need to look into this, in all 10 places, to see what is done about
regfree
.
As far as I understand by our skype discussion, of the 10 places, only 1 (the one which code if cited above) has a missplaced regrefree
.
We took a look at the source code of regcomp
and it seems that when regcomp fails, it frees up any allocated memory, so, we've decided to not call regfree
in case regcomp fails. This just might be the reason of this crash ...
After the fix on PR https://github.com/telefonicaid/fiware-orion/pull/2813 this issue goes back to Quarantine milestone to evaluate how it goes.
Similar problem seen again with 1.7.0.
Restart at orion.lab.fiware.org:
Feb 20 15:32:19 restart caused by: process is not running
Version data:
{
"orion" : {
"version" : "1.7.0",
"uptime" : "1 d, 19 h, 8 m, 34 s",
"git_hash" : "e544780eb64a4a2557c1f51dde070b8d82b86c49",
"compile_time" : "Wed Feb 8 13:30:24 CET 2017",
"compiled_by" : "fermin",
"compiled_in" : "centollo"
}
}
Log trace showing transactions summary counter restart (the one hour offset at log traces is normal):
time=2017-02-20T14:31:26.976Z | lvl=SUMMARY | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=logSummary.cpp[126]:logSummary | msg=Transactions: 490693 (new: 2)
time=2017-02-20T14:31:26.977Z | lvl=SUMMARY | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=logSummary.cpp[127]:logSummary | msg=DB status: ok, raised: (total: 3, new: 0), released: (total: 3, new: 0)
time=2017-02-20T14:31:26.977Z | lvl=SUMMARY | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=logSummary.cpp[128]:logSummary | msg=Notification failure active alarms: 9, raised: (total: 9, new: 0), released: (total: 0, new: 0)
time=2017-02-20T14:31:26.977Z | lvl=SUMMARY | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=logSummary.cpp[129]:logSummary | msg=Bad input active alarms: 1, raised: (total: 32, new: 0), released: (total: 31, new: 0)
time=2017-02-20T14:31:49.855Z | lvl=ERROR | corr=4cfef506-f779-11e6-8ef8-52540003a38e | trans=1486566374-077-00000490694 | from=179.186.241.105 | srv=<default> | subsrv=<default> | comp=Orion | op=safeMongo.cpp[207]:getIntOrLongFieldAsLong | msg=Runtime Error (int/long field 'lastFailure' is missing in BSONObj <{ _id: ObjectId('58aae19eac958b4452780e14'), expiration: 1487604686, reference: "https://ngsiproxy.lab.fiware.org/callbacks/12:30:51%20PM-1:12:30:53%20PM-1", custom: false, throttling: -1, servicePath: "/#", status: "active", entities: [ { id: ".*", isPattern: "true", type: "Node", isTypePattern: false }, { id: ".*", isPattern: "true", type: "AMMS", isTypePattern: false }, { id: ".*", isPattern: "true", type: "Regulator", isTypePattern: false } ], attrs: [], metadata: [], blacklist: false, conditions: [ "Latitud", "Longitud", "presence", "batteryCharge", "illuminance", "ActivePower", "ReactivePower", "electricPotential", "electricalCurrent" ], lastNotification: 1487593892, count: 1, expression: { q: "", mq: "", geometry: "", coords: "", georel: "" }, format: "JSON", lastSuccess: 1487593892 }> from caller setLastFailure:480)
time=2017-02-20T14:32:19.228Z | lvl=WARN | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=AlarmManager.cpp[405]:badInput | msg=Raising alarm BadInput : invalid regular expression for idPattern
time=2017-02-20T14:32:24.262Z | lvl=WARN | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=AlarmManager.cpp[393]:badInput | msg=Repeated BadInput : invalid regular expression for idPattern
time=2017-02-20T14:33:19.223Z | lvl=SUMMARY | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=logSummary.cpp[126]:logSummary | msg=Transactions: 1 (new: 1)
time=2017-02-20T14:33:19.223Z | lvl=SUMMARY | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=logSummary.cpp[127]:logSummary | msg=DB status: ok, raised: (total: 0, new: 0), released: (total: 0, new: 0)
time=2017-02-20T14:33:19.236Z | lvl=SUMMARY | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=logSummary.cpp[128]:logSummary | msg=Notification failure active alarms: 0, raised: (total: 0, new: 0), released: (total: 0, new: 0)
time=2017-02-20T14:33:19.237Z | lvl=SUMMARY | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=logSummary.cpp[129]:logSummary | msg=Bad input active alarms: 1, raised: (total: 1, new: 1), released: (total: 0, new: 0)
time=2017-02-20T14:34:19.237Z | lvl=SUMMARY | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=logSummary.cpp[126]:logSummary | msg=Transactions: 3 (new: 2)
Thus, moving out Quarentine milestone.
Another case:
Mar 1 12:59:09 restart caused by: process is not running
Log trace:
time=2017-03-01T11:56:48.619Z | lvl=SUMMARY | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=logSummary.cpp[126]:logSummary | msg=Transactions: 8737 (new: 3)
time=2017-03-01T11:56:48.619Z | lvl=SUMMARY | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=logSummary.cpp[127]:logSummary | msg=DB status: ok, raised: (total: 0, new: 0), released: (total: 0, new: 0)
time=2017-03-01T11:56:48.619Z | lvl=SUMMARY | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=logSummary.cpp[128]:logSummary | msg=Notification failure active alarms: 3, raised: (total: 3, new: 0), released: (total: 0, new: 0)
time=2017-03-01T11:56:48.619Z | lvl=SUMMARY | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=logSummary.cpp[129]:logSummary | msg=Bad input active alarms: 1, raised: (total: 15, new: 0), released: (total: 14, new: 0)
time=2017-03-01T11:57:42.259Z | lvl=ERROR | corr=42d20344-fe76-11e6-9d85-52540003a38e | trans=1488274763-290-00000008740 | from=153.156.75.233, | srv=<default> | subsrv=<default> | comp=Orion | op=safeMongo.cpp[207]:getIntOrLongFieldAsLong | msg=Runtime Error (int/long field 'lastFailure' is missing in BSONObj <{ _id: ObjectId('58b69b08ab84172d7d948951'), expiration: 1488373048, reference: "https://ngsiproxy.lab.fiware.org/callbacks/9:56:47%20AM-1:9:57:26%20AM-1", custom: false, throttling: -1, servicePath: "/#", status: "active", entities: [ { id: ".*", isPattern: "true", type: "Node", isTypePattern: false }, { id: ".*", isPattern: "true", type: "AMMS", isTypePattern: false }, { id: ".*", isPattern: "true", type: "Regulator", isTypePattern: false } ], attrs: [], metadata: [], blacklist: false, conditions: [ "Latitud", "Longitud", "presence", "batteryCharge", "illuminance", "ActivePower", "ReactivePower", "electricPotential", "electricalCurrent" ], lastNotification: 1488362254, count: 1, expression: { q: "", mq: "", geometry: "", coords: "", georel: "" }, format: "JSON", lastSuccess: 1488362254 }> from caller setLastFailure:480)
time=2017-03-01T11:59:09.801Z | lvl=WARN | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=AlarmManager.cpp[405]:badInput | msg=Raising alarm BadInput : invalid regular expression for idPattern
time=2017-03-01T11:59:14.837Z | lvl=WARN | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=AlarmManager.cpp[393]:badInput | msg=Repeated BadInput : invalid regular expression for idPattern
time=2017-03-01T12:00:09.796Z | lvl=SUMMARY | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=logSummary.cpp[126]:logSummary | msg=Transactions: 2 (new: 2)
Version:
{
"orion" : {
"version" : "1.7.0",
"uptime" : "4 d, 20 h, 4 m, 56 s",
"git_hash" : "e544780eb64a4a2557c1f51dde070b8d82b86c49",
"compile_time" : "Wed Feb 8 13:30:24 CET 2017",
"compiled_by" : "fermin",
"compiled_in" : "centollo"
}
}
Another case:
Feb 28 10:39:23 restart caused by: process is not running
Log trace:
time=2017-02-28T09:37:54.997Z | lvl=SUMMARY | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=logSummary.cpp[126]:logSummary | msg=Transactions: 102037 (new: 3)
time=2017-02-28T09:37:54.998Z | lvl=SUMMARY | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=logSummary.cpp[127]:logSummary | msg=DB status: ok, raised: (total: 0, new: 0), released: (total: 0, new: 0)
time=2017-02-28T09:37:54.998Z | lvl=SUMMARY | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=logSummary.cpp[128]:logSummary | msg=Notification failure active alarms: 13, raised: (total: 16, new: 0), released: (total: 3, new: 0)
time=2017-02-28T09:37:54.998Z | lvl=SUMMARY | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=logSummary.cpp[129]:logSummary | msg=Bad input active alarms: 1, raised: (total: 14, new: 0), released: (total: 13, new: 0)
time=2017-02-28T09:38:07.669Z | lvl=ERROR | corr=98b13038-fd99-11e6-8c95-52540003a38e | trans=1487601139-191-00000102038 | from=153.156.75.233, | srv=<default> | subsrv=<default> | comp=Orion | op=safeMongo.cpp[207]:getIntOrLongFieldAsLong | msg=Runtime Error (int/long field 'lastFailure' is missing in BSONObj <{ _id: ObjectId('58b528c6cc448f6fb5ab8466'), expiration: 1488278262, reference: "https://ngsiproxy.lab.fiware.org/callbacks/7:37:04%20AM-1:7:37:07%20AM-1", custom: false, throttling: -1, servicePath: "/#", status: "active", entities: [ { id: ".*", isPattern: "true", type: "Node", isTypePattern: false }, { id: ".*", isPattern: "true", type: "AMMS", isTypePattern: false }, { id: ".*", isPattern: "true", type: "Regulator", isTypePattern: false } ], attrs: [], metadata: [], blacklist: false, conditions: [ "Latitud", "Longitud", "presence", "batteryCharge", "illuminance", "ActivePower", "ReactivePower", "electricPotential", "electricalCurrent" ], lastNotification: 1488267468, count: 1, expression: { q: "", mq: "", geometry: "", coords: "", georel: "" }, format: "JSON", lastSuccess: 1488267469 }> from caller setLastFailure:480)
time=2017-02-28T09:39:23.324Z | lvl=WARN | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=AlarmManager.cpp[405]:badInput | msg=Raising alarm BadInput : invalid regular expression for idPattern
time=2017-02-28T09:39:28.359Z | lvl=WARN | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=AlarmManager.cpp[393]:badInput | msg=Repeated BadInput : invalid regular expression for idPattern
time=2017-02-28T09:40:23.319Z | lvl=SUMMARY | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=logSummary.cpp[126]:logSummary | msg=Transactions: 4 (new: 4)
Same version (1.7.0).
From orion.lab recent report:
Looking into contextBroker.log, we can see the moment in which the restart takes place (transactions number in the summary is "reseted" as a consecuence):
Maybe the cause is related with this:
"invalid regular expression for idPattern"
Version info:
A similar problem happened time ago:
that time it was with 1.5.0