Open pavlis opened 1 month ago
Case in point I just encountered that also shows something we need to document. We have a problem in that our DBClient
class has an intimate connection to dask/pyspark. I'm trying to debug a problem with a notebook using spyder. I converted the notebook to a python script to do that. However, it won't run in the local mode because of this infamous error being thrown by dask when I instantiate DBClient:
024-06-06 09:03:24,359 - distributed.nanny - ERROR - Failed to start process
Traceback (most recent call last):
File "/home/pavlis/anaconda3/envs/mspass_py310/lib/python3.10/site-packages/distributed/nanny.py", line 448, in instantiate
result = await self.process.start()
File "/home/pavlis/anaconda3/envs/mspass_py310/lib/python3.10/site-packages/distributed/nanny.py", line 748, in start
await self.process.start()
File "/home/pavlis/anaconda3/envs/mspass_py310/lib/python3.10/site-packages/distributed/process.py", line 55, in _call_and_set_future
res = func(*args, **kwargs)
File "/home/pavlis/anaconda3/envs/mspass_py310/lib/python3.10/site-packages/distributed/process.py", line 215, in _start
process.start()
File "/home/pavlis/anaconda3/envs/mspass_py310/lib/python3.10/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/home/pavlis/anaconda3/envs/mspass_py310/lib/python3.10/multiprocessing/context.py", line 288, in _Popen
return Popen(process_obj)
File "/home/pavlis/anaconda3/envs/mspass_py310/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
-- many more error lines --
I've seen this many times and know it is a dask collision problem. Note I'm not using our container but what I think is a pure local environment. I know a workaround for this particular problem - I will just modify my script to not instantiate DBClient but use the raw MongoDB interface. This points to yet another documentation that you, @wangyinz , need to fix. In the parallel processing section of the manual we need a section explaining what DBClient is and why it structured the way it is. I know approximately; it is a necessary evil to allow at Database object to serialize - that is why this topic belongs in the parallel processing section. A more detailed description of the concepts and why one should always use get_database
and get_database_client
rather than instantiate Database
and MongoClient
(or something like that) directly.
hmmm... I think you might be using the Client class instead of the DBClient. The DBClient should have nothing to do with the scheduler, but the Client do. It will detect the available scheduler and try to create a local cluster if none detected. I just added another else branch in the logic there so that it also works when dask and spark are both not being installed. However, I think in your case, you already have dask installed, so it is trying to connect to the scheduler.
I just added the conda document. For the debug related topic, maybe that belongs to a developer guide. I need to think about how to write it. This is actually pretty complicated considering debugging the C++ code. The C++ code in the conda package does not have the debug symbols, so we will need a local build anyway in this case.
Now that we have a fully functional conda package from both intel and arm64 architectures there is a hole in our documentation. @wangyinz you are the one to fill this hole as I am not sure exactly how to best do handle it.
This quickly gets into inconsistencies in pip- versus conda installs and how it all interacts with a local python environment. There are multiple ways, I think, for any of us to screw this up because pip and conda are two not always compatible package managers. Ways I know that cause confusion are:
pip install --user ./
run from the top of the source tree after compiling the C++ code. That puts a version of mspass in the ~/.local directory (at least in linux - not so sure about macos) If one does that and has the conda package nstalled in some environment, which takes precedence and how do you know if it does?Bottom line is that @wangyinz needs to start a new documentation page on this general topic. I promise to extend it if you can start the process - I'm too confused myself to write anything that isn't potential misinformation.