wwu-mmll / photonai

PHOTONAI is a high level python API for designing and optimizing machine learning pipelines.
https://photon-ai.com/
GNU General Public License v3.0
74 stars 14 forks source link

Permutation test: (1) document to large to save to MongoDB (2) _calculate_results: mongodb path set to trap-umbriel #10

Closed MSchmitt-git closed 2 weeks ago

MSchmitt-git commented 4 years ago

While running a permutation test with Photon, the following issues occurred.

Issue 1: While running the test (implemented as suggested in the docu https://www.photon-ai.com/documentation/permutation_test), the script is unable to write the results of the very first permutation (y=ytrue) into the MongoDB since the file size is too large. Since the PermutationTest relies on the MonoDB entries the process finishes with code 1.

As a workaround, reducing the number of cv folds or the number of hyperparameter configurations (hence reducing the data to be stored in the MongoDB) solves the problem and the test will continue to run without any issues. The most obvious solution to me would be to forgo saving certain data into the MongoDB (e.g. feature importances, predictions). To my understanding this had been implemented in the past but was discontinued (commit d5ecd1b, 24.09.2019).

Issue 2: While calculating the permutation results, the server cannot be found. It seems as in the def _calculate_results function (line 181), the server is set to mongodb_path="mongodb://trap-umbriel:27017/photon_results" and not to the server which has been set by the user.

photonai version: 1.1.0 (develop tree) OS: MAC OS 10.15.4 n samples: 1650 features (predictors): 55

Error log: Issue 1: File "/Users/michael/opt/anaconda3/envs/photon/lib/python3.7/site-packages/photonai/processing/permutation_test.py", line 90, in fit self.pipe.results.save() File "/Users/michael/opt/anaconda3/envs/photon/lib/python3.7/site-packages/pymodm/base/models.py", line 476, in save self.to_son(), upsert=True) File "/Users/michael/opt/anaconda3/envs/photon/lib/python3.7/site-packages/pymongo/collection.py", line 930, in replace_one collation=collation, session=session), File "/Users/michael/opt/anaconda3/envs/photon/lib/python3.7/site-packages/pymongo/collection.py", line 856, in _update_retryable _update, session) File "/Users/michael/opt/anaconda3/envs/photon/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1491, in _retryable_write return self._retry_with_session(retryable, func, s, None) File "/Users/michael/opt/anaconda3/envs/photon/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1384, in _retry_with_session return func(session, sock_info, retryable) File "/Users/michael/opt/anaconda3/envs/photon/lib/python3.7/site-packages/pymongo/collection.py", line 852, in _update retryable_write=retryable_write) File "/Users/michael/opt/anaconda3/envs/photon/lib/python3.7/site-packages/pymongo/collection.py", line 822, in _update retryable_write=retryable_write).copy() File "/Users/michael/opt/anaconda3/envs/photon/lib/python3.7/site-packages/pymongo/pool.py", line 618, in command self._raise_connection_failure(error) File "/Users/michael/opt/anaconda3/envs/photon/lib/python3.7/site-packages/pymongo/pool.py", line 613, in command user_fields=user_fields) File "/Users/michael/opt/anaconda3/envs/photon/lib/python3.7/site-packages/pymongo/network.py", line 143, in command name, size, max_bson_size + message._COMMAND_OVERHEAD) File "/Users/michael/opt/anaconda3/envs/photon/lib/python3.7/site-packages/pymongo/message.py", line 1077, in _raise_document_too_large raise DocumentTooLarge("%r command document too large" % (operation,)) pymongo.errors.DocumentTooLarge: 'update' command document too large

Issue 2: Traceback (most recent call last): perm_tester.fit(X, y) File "/Users/michael/opt/anaconda3/envs/photon/lib/python3.7/site-packages/photonai/processing/permutation_test.py", line 143, in fit perm_result = self._calculate_results(self.permutation_id) File "/Users/michael/opt/anaconda3/envs/photon/lib/python3.7/site-packages/photonai/processing/permutation_test.py", line 185, in _calculate_results mother_permutation = PermutationTest.find_reference(mongodb_path, permutation_id) File "/Users/michael/opt/anaconda3/envs/photon/lib/python3.7/site-packages/photonai/processing/permutation_test.py", line 291, in find_reference mother_permutation = _find_mummy(permutation_id) File "/Users/michael/opt/anaconda3/envs/photon/lib/python3.7/site-packages/photonai/processing/permutation_test.py", line 284, in _find_mummy 'computation_completed': True}).order_by([('computation_start_time', DESCENDING)]).first() File "/Users/michael/opt/anaconda3/envs/photon/lib/python3.7/site-packages/pymodm/queryset.py", line 127, in first return next(iter(self.limit(-1))) File "/Users/michael/opt/anaconda3/envs/photon/lib/python3.7/site-packages/pymodm/queryset.py", line 543, in return (to_instance(doc) for doc in self._get_raw_cursor()) File "/Users/michael/opt/anaconda3/envs/photon/lib/python3.7/site-packages/pymongo/cursor.py", line 1156, in next if len(self.data) or self._refresh(): File "/Users/michael/opt/anaconda3/envs/photon/lib/python3.7/site-packages/pymongo/cursor.py", line 1050, in _refresh self.session = self.collection.database.client._ensure_session() File "/Users/michael/opt/anaconda3/envs/photon/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1810, in _ensure_session return self.__start_session(True, causal_consistency=False) File "/Users/michael/opt/anaconda3/envs/photon/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1763, in start_session server_session = self._get_server_session() File "/Users/michael/opt/anaconda3/envs/photon/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1796, in _get_server_session return self._topology.get_server_session() File "/Users/michael/opt/anaconda3/envs/photon/lib/python3.7/site-packages/pymongo/topology.py", line 485, in get_server_session None) File "/Users/michael/opt/anaconda3/envs/photon/lib/python3.7/site-packages/pymongo/topology.py", line 209, in _select_servers_loop self._error_message(selector)) pymongo.errors.ServerSelectionTimeoutError: trap-umbriel:27017: [Errno 8] nodename nor servname provided, or not known

RLeenings commented 4 years ago

That is indeed a problem. Thank you for mentioning. Generally, we get rid of all unnecessary information to avoid this error in the result structure. However, a possible reason for this could be that the learning algorithm you use offers importance scores, and that those cause the amount of data. Maybe you can get in touch so that we can find the issue.