telefonicaid / fiware-cygnus

A connector in charge of persisting context data sources into other third-party databases and storage systems, creating a historical view of the context
https://fiware-cygnus.rtfd.io/
GNU Affero General Public License v3.0
65 stars 105 forks source link

Remove dataExpiration from createCollection (capped collection one) (mongo) #2159

Open fgalan opened 2 years ago

fgalan commented 2 years ago

https://github.com/telefonicaid/fiware-cygnus/blob/master/cygnus-common/src/main/java/com/telefonica/iot/cygnus/backends/mongo/MongoBackendImpl.java

    public void createCollection(String dbName, String collectionName, long collectionsSize, long maxDocuments,
            long dataExpiration) throws MongoException

The dataExpiration parameters (and associtted logic in the function) should be removed, as, according to MongoDB documentation on capped collections (https://www.mongodb.com/docs/manual/core/capped-collections/):

TTL indexes are not compatible with capped collections.

fgalan commented 2 years ago

I understand that all the calls to that createCollection() method use dataExpiration == 0 (or errors would be happening) I would be a good idea to check it.

AlvaroVega commented 2 years ago

Cygnus is creating a capped collection just when:

https://github.com/telefonicaid/fiware-cygnus/blob/d7a962ac5c08ea6445882fde30652c4721a67e7c/cygnus-common/src/main/java/com/telefonica/iot/cygnus/backends/mongo/MongoBackendImpl.java#L153-L158

I'm afraid that If just one of these options are provide then nothing is done

https://github.com/telefonicaid/fiware-cygnus/blob/master/doc/cygnus-ngsi/flume_extensions_catalogue/ngsi_mongo_sink.md#section2.1

data_expiration no 0 Collections will be removed if older than the value specified in seconds. The reference of time is the one stored in the recvTime property. Set to 0 if not wanting this policy.
collections_size no 0 The oldest data (according to insertion time) will be removed if the size of the data collection gets bigger than the value specified in bytes. Notice that the size-based truncation policy takes precedence over the time-based one. Set to 0 if not wanting this policy. Minimum value (different than 0) is 4096 bytes.
max_documents no 0 The oldest data (according to insertion time) will be removed if the number of documents in the data collections goes beyond the specified value. Set to 0 if not wanting this policy.

Maybe these documentation should be updated to reflect that both options should be provided with a value >0 to achieve that.

According with mongo driver java:

https://mongodb.github.io/mongo-java-driver/3.6/javadoc/com/mongodb/client/model/CreateCollectionOptions.html#sizeInBytes-long-

sizeInBytes(long sizeInBytes) Gets the maximum size of in bytes of a capped collection

maxDocuments(long maxDocuments) Sets the maximum number of documents allowed in a capped collection.

sizeInBytes does get?

AlvaroVega commented 2 years ago

I understand that all the calls to that createCollection() method use dataExpiration == 0 (or errors would be happening) I would be a good idea to check it.

But using this sink config

cygnus-ngsi.sinks.sth-sink.data_expiration = 10000 cygnus-ngsi.sinks.sth-sink.collections_size = 536870912 cygnus-ngsi.sinks.sth-sink.max_documents = 1000

No errors are reported when createCollection (capped + creaetIndex):

time=2022-05-03T14:21:30.986Z | lvl=DEBUG | corr=84f26ad1-8203-4e01-8506-dbed42cd5376; cbnotif=1 | trans=85b8945a-3c5f-49d6-8039-2323e239610d | srv=N/A | subsrv=N/A | comp=cygnus-ngsi | op=createCollection | msg=com.telefonica.iot.cygnus.backends.mongo.MongoBackendImpl[158] : Creating Mongo collection=sth_/_device:dispAVG2_device at database=sth_smartcity with collections_size=100000 and max_documents=1000 options time=2022-05-03T14:21:31.056Z | lvl=DEBUG | corr=84f26ad1-8203-4e01-8506-dbed42cd5376; cbnotif=1 | trans=85b8945a-3c5f-49d6-8039-2323e239610d | srv=N/A | subsrv=N/A | comp=cygnus-ngsi | op=insertContextDataAggregatedForResolution | msg=com.telefonica.iot.cygnus.backends.mongo.MongoBackendImpl[307] : Updating data, database=sthsmartcity, collection=sth/_device:dispAVG2_device.aggr, queries=[{"_id": {"attrName": "TimeInstant", "origin": {"$date": 1651587660000}, "resolution": "second", "range": "minute"}, "points.offset": 30}, {"_id": {"attrName": "TimeInstant", "origin": {"$date": 1651586400000}, "resolution": "minute", "range": "hour"}, "points.offset": 21}, {"_id": {"attrName": "TimeInstant", "origin": {"$date": 1651536000000}, "resolution": "hour", "range": "day"}, "points.offset": 14}, {"_id": {"attrName": "TimeInstant", "origin": {"$date": 1651363200000}, "resolution": "day", "range": "month"}, "points.offset": 3}, {"_id": {"attrName": "TimeInstant", "origin": {"$date": 1640995200000}, "resolution": "month", "range": "year"}, "points.offset": 5}], updates=[{"$set": {"attrType": "DateTime"}, "$inc": {"points.30.samples": 1, "points.30.occur.2022-05-03T14:21:29.532Z": 1}}, {"$set": {"attrType": "DateTime"}, "$inc": {"points.21.samples": 1, "points.21.occur.2022-05-03T14:21:29.532Z": 1}}, {"$set": {"attrType": "DateTime"}, "$inc": {"points.14.samples": 1, "points.14.occur.2022-05-03T14:21:29.532Z": 1}}, {"$set": {"attrType": "DateTime"}, "$inc": {"points.2.samples": 1, "points.2.occur.2022-05-03T14:21:29.532Z": 1}}, {"$set": {"attrType": "DateTime"}, "$inc": {"points.4.samples": 1, "points.4.occur.2022-05-03T14:21:29.532Z": 1}}]

AlvaroVega commented 2 years ago

According with https://www.mongodb.com/docs/manual/tutorial/expire-data/

To create a TTL index, use the db.collection.createIndex() method with the expireAfterSeconds option on a field whose value is either a date or an array that contains date values.

But currently cygnus is using:

https://github.com/telefonicaid/fiware-cygnus/blob/d7a962ac5c08ea6445882fde30652c4721a67e7c/cygnus-common/src/main/java/com/telefonica/iot/cygnus/backends/mongo/MongoBackendImpl.java#L129

expireAfterSeconds != expireAfter Is this a bug? No: java driver uses expireAfter: https://mongodb.github.io/mongo-java-driver/3.6/javadoc/com/mongodb/client/model/IndexOptions.html#expireAfter-java.lang.Long-java.util.concurrent.TimeUnit-

fgalan commented 2 years ago

No errors are reported when createCollection (capped + creaetIndex):

We have checked in a CB-Cygnus-MongoDB scenario that the expiration index is created without problems. An isolate test using only MongoDB and the mongo shell has been done also to confirm that.

Maybe there is a docu-bug in MongoDB official documentation. I have created the issue https://jira.mongodb.org/browse/DOCS-15313 trying to clarify it with MongoDB team.

I suggest to keep this issue on hold while we get feedback from MongoDB team.

fgalan commented 2 years ago

With regards to the block:

            if (collectionsSize != 0 && maxDocuments != 0) {
                CreateCollectionOptions options = new CreateCollectionOptions()
                        .capped(true)
                        .sizeInBytes(collectionsSize)
                        .maxDocuments(maxDocuments);
                LOGGER.debug("Creating Mongo collection=" + collectionName + " at database=" + dbName + " with "
                        + "collections_size=" + collectionsSize + " and max_documents=" + maxDocuments + " options");
                db.createCollection(collectionName, options);
            }

It's not correct, as both collectionSize and maxDocuments can be used idependently, according to MongoDB documentation at https://www.mongodb.com/docs/manual/reference/method/db.createCollection/#mongodb-method-db.createCollection:

imagen

So I'd suggest to use independent conditions and include a link to MongoDB documentation from Cygnus documentation in the case users want to know how to use them (e.g. which one takes preference over the other, etc.).

AlvaroVega commented 2 years ago

related https://github.com/telefonicaid/fiware-cygnus/pull/2163

fgalan commented 2 years ago

No errors are reported when createCollection (capped + creaetIndex):

We have checked in a CB-Cygnus-MongoDB scenario that the expiration index is created without problems. An isolate test using only MongoDB and the mongo shell has been done also to confirm that.

Maybe there is a docu-bug in MongoDB official documentation. I have created the issue https://jira.mongodb.org/browse/DOCS-15313 trying to clarify it with MongoDB team.

I suggest to keep this issue on hold while we get feedback from MongoDB team.

From MongoDB JIRA ticket:

My understanding is that we should update the language on the capped collection page to not make a blanket statement that TTL indexes are incompatible, but that the TTL index won't perform deletes.

Thus, it seems that TTL indexes can be created in capped collection, but they don't achieve the desire effect (in other words, they are useless).

So, the question is:

  1. Should Cygnus avoid creating this useless TTL index in this case?
  2. Should Cygnus stay as it is now creating the TTL index assuming that the user knows what happen in this situation?

In my opinion it's better option 1. Having an index that doesn't work as expected can be confusing for users without expertise in MongoDB.