Open fgalan opened 2 years ago
I understand that all the calls to that createCollection() method use dataExpiration == 0
(or errors would be happening) I would be a good idea to check it.
Cygnus is creating a capped collection just when:
I'm afraid that If just one of these options are provide then nothing is done
data_expiration | no | 0 | Collections will be removed if older than the value specified in seconds. The reference of time is the one stored in the recvTime property. Set to 0 if not wanting this policy. |
---|---|---|---|
collections_size | no | 0 | The oldest data (according to insertion time) will be removed if the size of the data collection gets bigger than the value specified in bytes. Notice that the size-based truncation policy takes precedence over the time-based one. Set to 0 if not wanting this policy. Minimum value (different than 0) is 4096 bytes. |
max_documents | no | 0 | The oldest data (according to insertion time) will be removed if the number of documents in the data collections goes beyond the specified value. Set to 0 if not wanting this policy. |
Maybe these documentation should be updated to reflect that both options should be provided with a value >0 to achieve that.
According with mongo driver java:
sizeInBytes(long sizeInBytes) Gets the maximum size of in bytes of a capped collection
maxDocuments(long maxDocuments) Sets the maximum number of documents allowed in a capped collection.
sizeInBytes does get?
I understand that all the calls to that createCollection() method use
dataExpiration == 0
(or errors would be happening) I would be a good idea to check it.
But using this sink config
cygnus-ngsi.sinks.sth-sink.data_expiration = 10000 cygnus-ngsi.sinks.sth-sink.collections_size = 536870912 cygnus-ngsi.sinks.sth-sink.max_documents = 1000
No errors are reported when createCollection (capped + creaetIndex):
time=2022-05-03T14:21:30.986Z | lvl=DEBUG | corr=84f26ad1-8203-4e01-8506-dbed42cd5376; cbnotif=1 | trans=85b8945a-3c5f-49d6-8039-2323e239610d | srv=N/A | subsrv=N/A | comp=cygnus-ngsi | op=createCollection | msg=com.telefonica.iot.cygnus.backends.mongo.MongoBackendImpl[158] : Creating Mongo collection=sth_/_device:dispAVG2_device at database=sth_smartcity with collections_size=100000 and max_documents=1000 options time=2022-05-03T14:21:31.056Z | lvl=DEBUG | corr=84f26ad1-8203-4e01-8506-dbed42cd5376; cbnotif=1 | trans=85b8945a-3c5f-49d6-8039-2323e239610d | srv=N/A | subsrv=N/A | comp=cygnus-ngsi | op=insertContextDataAggregatedForResolution | msg=com.telefonica.iot.cygnus.backends.mongo.MongoBackendImpl[307] : Updating data, database=sthsmartcity, collection=sth/_device:dispAVG2_device.aggr, queries=[{"_id": {"attrName": "TimeInstant", "origin": {"$date": 1651587660000}, "resolution": "second", "range": "minute"}, "points.offset": 30}, {"_id": {"attrName": "TimeInstant", "origin": {"$date": 1651586400000}, "resolution": "minute", "range": "hour"}, "points.offset": 21}, {"_id": {"attrName": "TimeInstant", "origin": {"$date": 1651536000000}, "resolution": "hour", "range": "day"}, "points.offset": 14}, {"_id": {"attrName": "TimeInstant", "origin": {"$date": 1651363200000}, "resolution": "day", "range": "month"}, "points.offset": 3}, {"_id": {"attrName": "TimeInstant", "origin": {"$date": 1640995200000}, "resolution": "month", "range": "year"}, "points.offset": 5}], updates=[{"$set": {"attrType": "DateTime"}, "$inc": {"points.30.samples": 1, "points.30.occur.2022-05-03T14:21:29.532Z": 1}}, {"$set": {"attrType": "DateTime"}, "$inc": {"points.21.samples": 1, "points.21.occur.2022-05-03T14:21:29.532Z": 1}}, {"$set": {"attrType": "DateTime"}, "$inc": {"points.14.samples": 1, "points.14.occur.2022-05-03T14:21:29.532Z": 1}}, {"$set": {"attrType": "DateTime"}, "$inc": {"points.2.samples": 1, "points.2.occur.2022-05-03T14:21:29.532Z": 1}}, {"$set": {"attrType": "DateTime"}, "$inc": {"points.4.samples": 1, "points.4.occur.2022-05-03T14:21:29.532Z": 1}}]
According with https://www.mongodb.com/docs/manual/tutorial/expire-data/
To create a TTL index, use the db.collection.createIndex() method with the expireAfterSeconds option on a field whose value is either a date or an array that contains date values.
But currently cygnus is using:
expireAfterSeconds != expireAfter Is this a bug? No: java driver uses expireAfter: https://mongodb.github.io/mongo-java-driver/3.6/javadoc/com/mongodb/client/model/IndexOptions.html#expireAfter-java.lang.Long-java.util.concurrent.TimeUnit-
No errors are reported when createCollection (capped + creaetIndex):
We have checked in a CB-Cygnus-MongoDB scenario that the expiration index is created without problems. An isolate test using only MongoDB and the mongo shell has been done also to confirm that.
Maybe there is a docu-bug in MongoDB official documentation. I have created the issue https://jira.mongodb.org/browse/DOCS-15313 trying to clarify it with MongoDB team.
I suggest to keep this issue on hold while we get feedback from MongoDB team.
With regards to the block:
if (collectionsSize != 0 && maxDocuments != 0) {
CreateCollectionOptions options = new CreateCollectionOptions()
.capped(true)
.sizeInBytes(collectionsSize)
.maxDocuments(maxDocuments);
LOGGER.debug("Creating Mongo collection=" + collectionName + " at database=" + dbName + " with "
+ "collections_size=" + collectionsSize + " and max_documents=" + maxDocuments + " options");
db.createCollection(collectionName, options);
}
It's not correct, as both collectionSize and maxDocuments can be used idependently, according to MongoDB documentation at https://www.mongodb.com/docs/manual/reference/method/db.createCollection/#mongodb-method-db.createCollection:
So I'd suggest to use independent conditions and include a link to MongoDB documentation from Cygnus documentation in the case users want to know how to use them (e.g. which one takes preference over the other, etc.).
No errors are reported when createCollection (capped + creaetIndex):
We have checked in a CB-Cygnus-MongoDB scenario that the expiration index is created without problems. An isolate test using only MongoDB and the mongo shell has been done also to confirm that.
Maybe there is a docu-bug in MongoDB official documentation. I have created the issue https://jira.mongodb.org/browse/DOCS-15313 trying to clarify it with MongoDB team.
I suggest to keep this issue on hold while we get feedback from MongoDB team.
From MongoDB JIRA ticket:
My understanding is that we should update the language on the capped collection page to not make a blanket statement that TTL indexes are incompatible, but that the TTL index won't perform deletes.
Thus, it seems that TTL indexes can be created in capped collection, but they don't achieve the desire effect (in other words, they are useless).
So, the question is:
In my opinion it's better option 1. Having an index that doesn't work as expected can be confusing for users without expertise in MongoDB.
https://github.com/telefonicaid/fiware-cygnus/blob/master/cygnus-common/src/main/java/com/telefonica/iot/cygnus/backends/mongo/MongoBackendImpl.java
The dataExpiration parameters (and associtted logic in the function) should be removed, as, according to MongoDB documentation on capped collections (https://www.mongodb.com/docs/manual/core/capped-collections/):