strands-project / mongodb_store

MongoDB tools for storing and analysing runs of ROS systems.
BSD 3-Clause "New" or "Revised" License
49 stars 72 forks source link

Don't use extra servers in queries #183

Closed marc-hanheide closed 6 years ago

marc-hanheide commented 7 years ago

This refers to the dangerous functionality in message_store_node.py#L226-L233:

        for extra_client in self.extra_clients:
            if len(entries) == 0:
                extra_collection = extra_client[req.database][req.collection]
                entries =  dc_util.query_message(extra_collection, obj_query, sort_query_tuples, req.single, req.limit)
                if len(entries) > 0:
                    rospy.loginfo("found result in extra datacentre")
            else:
                break

When a query to the primary server returns nothin then it automatically calls the secondary "extra" servers, i.e., those that are only set up for replication. This is highly intransparent and can cause a lot of issues. This behaviour should at best be optional and to be explicitly enabled. I suggest to disable it by default and add another flag to enable it if really needed.

Discussed also woth @hawesie and @Jailander. To quote @hawesie:

Blimey, I'm not sure I realised it was still doing that. I remember thinking it was pretty dangerous at some point, but didn't think anyone was still using this. I can't imagine anyone uses this "feature" so please cut it out of there.

marc-hanheide commented 7 years ago

Also this is a bit strange, to be hones: https://github.com/strands-project/mongodb_store/blob/c844f4334679f5dcddf6be132713f6c3c45315ae/mongodb_store/scripts/message_store_node.py#L131-L141

It seems a delete is always propagated also to replicated sets, while for a write it needs replicate_on_write to be True. This also related to #181. As also, things are always backed-up. This is again very intransparent. This whole dealing with extra_clients in my view needs changing. The extra_clients make a lot of sense for the replicator_node, but for the message_store_node I think their behaviour is not clear at all.

One suggestion would be to make this all optional, to introduce a new set of flags to complement replicate_on_write, namely, query_all_backends (I also don't understand why it's called extra "clients") to indicate if data should be queried, and delete_from_all_backends to indicate if a query should be replicated.

Or, I think, honestly, we might want to scrap the whole of the extra_client functionality in message_store_node, as it just is a poor reimplementation of the core mongodb functionality of replication sets in the first place... Maybe somebody can explain to me its rationale. The extra_clients do make sense in the context of the replicator_node, but I think it only does so there (and still the name "client" is confusing the least).

marc-hanheide commented 6 years ago

@hawesie is this still a thing?

hawesie commented 6 years ago

It is, but I'm happy to make it a non-default option and mark as deprecated and use that to flush out people who use it...

furushchev commented 6 years ago

@marc-hanheide I recently sent a pull request to disable asking query to extra servers by default and now it was merged to main branch. Does it solve your issue? Please could you briefly check if possible and close this issue if it's ok for you.

marc-hanheide commented 6 years ago

looks good! Thanks @furushchev