voxel51 / fiftyone

Refine high-quality datasets and visual AI models
https://fiftyone.ai
Apache License 2.0
8.86k stars 560 forks source link

[BUG] DatabaseService failed to bind to port #3188

Open NicoJorgensen1 opened 1 year ago

NicoJorgensen1 commented 1 year ago

System information

Commands to reproduce

As thoroughly as possible, please provide the Python and/or shell commands used to encounter the issue. Application steps can be described in the next section.

python -m pip install fiftyone
python -c "import fiftyone as fo"

Describe the problem

I have an ssh connection to a server computer in my basement. I have ssh'ed into the computer through "Attach to Host" in vscode. On the server computer I install fiftyone - both locally and when running a docker container. The installation goes without any issues. In python I can say import fiftyone as fo and then I quickly get an error with something like "subprocess failed with error 14":

`{"t":{"$date":"2023-06-12T12:22:56.265Z"},"s":"I",  "c":"CONTROL",  "id":20697,   "ctx":"-","msg":"Renamed existing log file","attr":{"oldLogPath":"/fiftyone/db/log/mongo.log","newLogPath":"/fiftyone/db/log/mongo.log.2023-06-12T12-22-56"}}
Subprocess ['/usr/local/lib/python3.9/dist-packages/fiftyone/db/bin/mongod', '--dbpath', '/fiftyone/db', '--logpath', '/fiftyone/db/log/mongo.log', '--port', '0', '--nounixsocket'] exited with error 14:`

Then the process times out, waits for a while, and then several new error messages with a traceback are displayed, ending with fiftyone.core.service.ServiceListenTimeout: fiftyone.core.service.DatabaseService failed to bind to port:

`Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python3.9/dist-packages/fiftyone/__init__.py", line 25, in <module>
    from fiftyone.__public__ import *
  File "/usr/local/lib/python3.9/dist-packages/fiftyone/__public__.py", line 15, in <module>
    _foo.establish_db_conn(config)
  File "/usr/local/lib/python3.9/dist-packages/fiftyone/core/odm/database.py", line 200, in establish_db_conn
    port = _db_service.port
  File "/usr/local/lib/python3.9/dist-packages/fiftyone/core/service.py", line 276, in port
    return self._wait_for_child_port()
  File "/usr/local/lib/python3.9/dist-packages/fiftyone/core/service.py", line 170, in _wait_for_child_port
    return find_port()
  File "/usr/local/lib/python3.9/dist-packages/retrying.py", line 56, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)
  File "/usr/local/lib/python3.9/dist-packages/retrying.py", line 266, in call
    raise attempt.get()
  File "/usr/local/lib/python3.9/dist-packages/retrying.py", line 301, in get
    six.reraise(self.value[0], self.value[1], self.value[2])
  File "/usr/lib/python3/dist-packages/six.py", line 703, in reraise
    raise value
  File "/usr/local/lib/python3.9/dist-packages/retrying.py", line 251, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  File "/usr/local/lib/python3.9/dist-packages/fiftyone/core/service.py", line 168, in find_port
    raise ServiceListenTimeout(etau.get_class_name(self), port)
fiftyone.core.service.ServiceListenTimeout: fiftyone.core.service.DatabaseService failed to bind to port``

Other info/logs

Don't know if any logs are saved anywhere, but here is a copy of the output in the terminal after trying to 'import fiftyone as fo'. Here is an example of the entire error message printed out when trying to import fiftyone:

`goatherd@aiserver2:~/fiftyone$ python -c "import fiftyone as fo"
{"t":{"$date":"2023-06-12T12:22:56.265Z"},"s":"I",  "c":"CONTROL",  "id":20697,   "ctx":"-","msg":"Renamed existing log file","attr":{"oldLogPath":"/fiftyone/db/log/mongo.log","newLogPath":"/fiftyone/db/log/mongo.log.2023-06-12T12-22-56"}}
Subprocess ['/usr/local/lib/python3.9/dist-packages/fiftyone/db/bin/mongod', '--dbpath', '/fiftyone/db', '--logpath', '/fiftyone/db/log/mongo.log', '--port', '0', '--nounixsocket'] exited with error 14:
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python3.9/dist-packages/fiftyone/__init__.py", line 25, in <module>
    from fiftyone.__public__ import *
  File "/usr/local/lib/python3.9/dist-packages/fiftyone/__public__.py", line 15, in <module>
    _foo.establish_db_conn(config)
  File "/usr/local/lib/python3.9/dist-packages/fiftyone/core/odm/database.py", line 200, in establish_db_conn
    port = _db_service.port
  File "/usr/local/lib/python3.9/dist-packages/fiftyone/core/service.py", line 276, in port
    return self._wait_for_child_port()
  File "/usr/local/lib/python3.9/dist-packages/fiftyone/core/service.py", line 170, in _wait_for_child_port
    return find_port()
  File "/usr/local/lib/python3.9/dist-packages/retrying.py", line 56, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)
  File "/usr/local/lib/python3.9/dist-packages/retrying.py", line 266, in call
    raise attempt.get()
  File "/usr/local/lib/python3.9/dist-packages/retrying.py", line 301, in get
    six.reraise(self.value[0], self.value[1], self.value[2])
  File "/usr/lib/python3/dist-packages/six.py", line 703, in reraise
    raise value
  File "/usr/local/lib/python3.9/dist-packages/retrying.py", line 251, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  File "/usr/local/lib/python3.9/dist-packages/fiftyone/core/service.py", line 168, in find_port
    raise ServiceListenTimeout(etau.get_class_name(self), port)
fiftyone.core.service.ServiceListenTimeout: fiftyone.core.service.DatabaseService failed to bind to port`

The weird thing is that after a restart of my computer/server, I am good to go and able to re-import fiftyone using a regular script. If I run the command in "regular" mode, I get no errors, as shown below: image

If I use debug mode in vscode, I get the error. image In this screenshot it says that there is a ModuleNotFoundERror: No module named fiftyone, which is kind of weird as it is the same environment as I use when not running in debug mode. Therefore, I feel like something is wrong with the debugging mode of vscode and fiftyone, I just don't know what...

What areas of FiftyOne does this bug affect?

Willingness to contribute

The FiftyOne Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the FiftyOne codebase?

kaixi-wang commented 1 year ago

have you set up port forwarding for the db server? https://code.visualstudio.com/docs/remote/ssh-tutorial#:~:text=your%20virtual%20machine.-,Port%20forwarding,-To%20be%20able

NicoJorgensen1 commented 1 year ago

Hi @kaixi-wang. With regards to port forwarding the answer is yes; and as I am able to perform python -c import fiftyone as fo without errors when running regularly, but not when running in debug mode, I assume something is wrong with the debug module or with how the MongoDB initialization is created - but I really have no clue.

However, today I have been testing the environment a bit further. Perhaps I was too quick to mention that the issue was happening only on my remote machine. Sorry about that as that might have mislead you guys. I was simply only developing on my remote machine as I want my fiftyone app to be accessible 24/7, and my remote machine is turned on 24/7, so I simply thought developing through a ssh connection to that machine was the easiest way of making that happen. Now, I don't (necessarily) think that the issue is related to using my basement-server computer. I think it is more related to how the fiftyone process is initialized/closed, when running vscode in debug mode.

A minimal reproducible example

  1. If I choose to run the script normaly using python3 run_51.py then I can do that in an endless loop without any issues, as seen on the below screenshot: image
  2. If I choose to run the script in debug mode, then the first time I run the script, the import will succed. The second time, I will get the error. image

The entire printed log, which you can see in my terminal are these lines:

`nico@nico-Legion:~/Documents/Capra_Projects/voxel51_fiftyone$  cd /home/nico/Documents/Capra_Projects/voxel51_fiftyone ; /usr/bin/env /bin/python3 /home/nico/.vscode/extensions/ms-python.python-2023.10.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher 50525 -- /home/nico/Documents/Capra_Projects/voxel51_fiftyone/run_51.py 
{"t":{"$date":"2023-06-13T05:31:21.401Z"},"s":"I",  "c":"CONTROL",  "id":20697,   "ctx":"-","msg":"Renamed existing log file","attr":{"oldLogPath":"/home/nico/.fiftyone/var/lib/mongo/log/mongo.log","newLogPath":"/home/nico/.fiftyone/var/lib/mongo/log/mongo.log.2023-06-13T05-31-21"}}
Subprocess ['/home/nico/.local/lib/python3.8/site-packages/fiftyone/db/bin/mongod', '--dbpath', '/home/nico/.fiftyone/var/lib/mongo', '--logpath', '/home/nico/.fiftyone/var/lib/mongo/log/mongo.log', '--port', '0', '--nounixsocket'] exited with error 100:
Backend GTK3Agg is interactive backend. Turning interactive mode on.`

The print outs in the terminal as shown above points to a --logpath at ~.fiftyone/var/lib/mongo/log/mongo.log. When viewing this logfile, this is the content:

`{"t":{"$date":"2023-06-13T07:58:20.878+02:00"},"s":"I",  "c":"NETWORK",  "id":4915701, "ctx":"-","msg":"Initialized wire specification","attr":{"spec":{"incomingExternalClient":{"minWireVersion":0,"maxWireVersion":13},"incomingInternalClient":{"minWireVersion":0,"maxWireVersion":13},"outgoing":{"minWireVersion":0,"maxWireVersion":13},"isInternalClient":true}}}
{"t":{"$date":"2023-06-13T07:58:20.878+02:00"},"s":"I",  "c":"CONTROL",  "id":23285,   "ctx":"main","msg":"Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'"}
{"t":{"$date":"2023-06-13T07:58:20.878+02:00"},"s":"W",  "c":"ASIO",     "id":22601,   "ctx":"main","msg":"No TransportLayer configured during NetworkInterface startup"}
{"t":{"$date":"2023-06-13T07:58:20.878+02:00"},"s":"I",  "c":"NETWORK",  "id":4648601, "ctx":"main","msg":"Implicit TCP FastOpen unavailable. If TCP FastOpen is required, set tcpFastOpenServer, tcpFastOpenClient, and tcpFastOpenQueueSize."}
{"t":{"$date":"2023-06-13T07:58:20.879+02:00"},"s":"W",  "c":"ASIO",     "id":22601,   "ctx":"main","msg":"No TransportLayer configured during NetworkInterface startup"}
{"t":{"$date":"2023-06-13T07:58:20.879+02:00"},"s":"I",  "c":"REPL",     "id":5123008, "ctx":"main","msg":"Successfully registered PrimaryOnlyService","attr":{"service":"TenantMigrationDonorService","ns":"config.tenantMigrationDonors"}}
{"t":{"$date":"2023-06-13T07:58:20.879+02:00"},"s":"I",  "c":"REPL",     "id":5123008, "ctx":"main","msg":"Successfully registered PrimaryOnlyService","attr":{"service":"TenantMigrationRecipientService","ns":"config.tenantMigrationRecipients"}}
{"t":{"$date":"2023-06-13T07:58:20.879+02:00"},"s":"I",  "c":"CONTROL",  "id":5945603, "ctx":"main","msg":"Multi threading initialized"}
{"t":{"$date":"2023-06-13T07:58:20.880+02:00"},"s":"I",  "c":"CONTROL",  "id":4615611, "ctx":"initandlisten","msg":"MongoDB starting","attr":{"pid":5020,"port":0,"dbPath":"/home/nico/.fiftyone/var/lib/mongo","architecture":"64-bit","host":"nico-Legion"}}
{"t":{"$date":"2023-06-13T07:58:20.880+02:00"},"s":"I",  "c":"CONTROL",  "id":23403,   "ctx":"initandlisten","msg":"Build Info","attr":{"buildInfo":{"version":"5.0.4","gitVersion":"62a84ede3cc9a334e8bc82160714df71e7d3a29e","openSSLVersion":"OpenSSL 1.1.1f  31 Mar 2020","modules":[],"allocator":"tcmalloc","environment":{"distmod":"ubuntu1804","distarch":"x86_64","target_arch":"x86_64"}}}}
{"t":{"$date":"2023-06-13T07:58:20.880+02:00"},"s":"I",  "c":"CONTROL",  "id":51765,   "ctx":"initandlisten","msg":"Operating System","attr":{"os":{"name":"Ubuntu","version":"20.04"}}}
{"t":{"$date":"2023-06-13T07:58:20.880+02:00"},"s":"I",  "c":"CONTROL",  "id":21951,   "ctx":"initandlisten","msg":"Options set by command line","attr":{"options":{"net":{"port":0,"unixDomainSocket":{"enabled":false}},"storage":{"dbPath":"/home/nico/.fiftyone/var/lib/mongo"},"systemLog":{"destination":"file","path":"/home/nico/.fiftyone/var/lib/mongo/log/mongo.log"}}}}
{"t":{"$date":"2023-06-13T07:58:20.888+02:00"},"s":"E",  "c":"CONTROL",  "id":20557,   "ctx":"initandlisten","msg":"DBException in initAndListen, terminating","attr":{"error":"DBPathInUse: Unable to lock the lock file: /home/nico/.fiftyone/var/lib/mongo/mongod.lock (Resource temporarily unavailable). Another mongod instance is already running on the /home/nico/.fiftyone/var/lib/mongo directory"}}
{"t":{"$date":"2023-06-13T07:58:20.888+02:00"},"s":"I",  "c":"REPL",     "id":4784900, "ctx":"initandlisten","msg":"Stepping down the ReplicationCoordinator for shutdown","attr":{"waitTimeMillis":15000}}
{"t":{"$date":"2023-06-13T07:58:20.889+02:00"},"s":"I",  "c":"COMMAND",  "id":4784901, "ctx":"initandlisten","msg":"Shutting down the MirrorMaestro"}
{"t":{"$date":"2023-06-13T07:58:20.889+02:00"},"s":"I",  "c":"SHARDING", "id":4784902, "ctx":"initandlisten","msg":"Shutting down the WaitForMajorityService"}
{"t":{"$date":"2023-06-13T07:58:20.889+02:00"},"s":"I",  "c":"NETWORK",  "id":20562,   "ctx":"initandlisten","msg":"Shutdown: going to close listening sockets"}
{"t":{"$date":"2023-06-13T07:58:20.889+02:00"},"s":"I",  "c":"NETWORK",  "id":4784905, "ctx":"initandlisten","msg":"Shutting down the global connection pool"}
{"t":{"$date":"2023-06-13T07:58:20.889+02:00"},"s":"I",  "c":"CONTROL",  "id":4784906, "ctx":"initandlisten","msg":"Shutting down the FlowControlTicketholder"}
{"t":{"$date":"2023-06-13T07:58:20.889+02:00"},"s":"I",  "c":"-",        "id":20520,   "ctx":"initandlisten","msg":"Stopping further Flow Control ticket acquisitions."}
{"t":{"$date":"2023-06-13T07:58:20.889+02:00"},"s":"I",  "c":"NETWORK",  "id":4784918, "ctx":"initandlisten","msg":"Shutting down the ReplicaSetMonitor"}
{"t":{"$date":"2023-06-13T07:58:20.889+02:00"},"s":"I",  "c":"SHARDING", "id":4784921, "ctx":"initandlisten","msg":"Shutting down the MigrationUtilExecutor"}
{"t":{"$date":"2023-06-13T07:58:20.889+02:00"},"s":"I",  "c":"ASIO",     "id":22582,   "ctx":"MigrationUtil-TaskExecutor","msg":"Killing all outstanding egress activity."}
{"t":{"$date":"2023-06-13T07:58:20.889+02:00"},"s":"I",  "c":"COMMAND",  "id":4784923, "ctx":"initandlisten","msg":"Shutting down the ServiceEntryPoint"}
{"t":{"$date":"2023-06-13T07:58:20.889+02:00"},"s":"I",  "c":"CONTROL",  "id":4784925, "ctx":"initandlisten","msg":"Shutting down free monitoring"}
{"t":{"$date":"2023-06-13T07:58:20.889+02:00"},"s":"I",  "c":"CONTROL",  "id":4784927, "ctx":"initandlisten","msg":"Shutting down the HealthLog"}
{"t":{"$date":"2023-06-13T07:58:20.889+02:00"},"s":"I",  "c":"CONTROL",  "id":4784928, "ctx":"initandlisten","msg":"Shutting down the TTL monitor"}
{"t":{"$date":"2023-06-13T07:58:20.889+02:00"},"s":"I",  "c":"CONTROL",  "id":4784929, "ctx":"initandlisten","msg":"Acquiring the global lock for shutdown"}
{"t":{"$date":"2023-06-13T07:58:20.889+02:00"},"s":"I",  "c":"-",        "id":4784931, "ctx":"initandlisten","msg":"Dropping the scope cache for shutdown"}
{"t":{"$date":"2023-06-13T07:58:20.889+02:00"},"s":"I",  "c":"FTDC",     "id":4784926, "ctx":"initandlisten","msg":"Shutting down full-time data capture"}
{"t":{"$date":"2023-06-13T07:58:20.889+02:00"},"s":"I",  "c":"CONTROL",  "id":20565,   "ctx":"initandlisten","msg":"Now exiting"}
{"t":{"$date":"2023-06-13T07:58:20.889+02:00"},"s":"I",  "c":"CONTROL",  "id":23138,   "ctx":"initandlisten","msg":"Shutting down","attr":{"exitCode":100}}
`

Now, after getting this error, I can't use the first option anymore and run python3 run_51.py. If I do that, I get the same error there again. That error is simply more spilled out and longer. Perhaps you can use that what it says to do something... These logs are here:

`nico@nico-Legion:~/Documents/Capra_Projects/voxel51_fiftyone$ python3 run_51.py 
{"t":{"$date":"2023-06-13T05:37:36.153Z"},"s":"I",  "c":"CONTROL",  "id":20697,   "ctx":"-","msg":"Renamed existing log file","attr":{"oldLogPath":"/home/nico/.fiftyone/var/lib/mongo/log/mongo.log","newLogPath":"/home/nico/.fiftyone/var/lib/mongo/log/mongo.log.2023-06-13T05-37-36"}}
Subprocess ['/home/nico/.local/lib/python3.8/site-packages/fiftyone/db/bin/mongod', '--dbpath', '/home/nico/.fiftyone/var/lib/mongo', '--logpath', '/home/nico/.fiftyone/var/lib/mongo/log/mongo.log', '--port', '0', '--nounixsocket'] exited with error 100:
Traceback (most recent call last):
  File "run_51.py", line 1, in <module>
    import fiftyone as fo 
  File "/home/nico/.local/lib/python3.8/site-packages/fiftyone/__init__.py", line 25, in <module>
    from fiftyone.__public__ import *
  File "/home/nico/.local/lib/python3.8/site-packages/fiftyone/__public__.py", line 15, in <module>
    _foo.establish_db_conn(config)
  File "/home/nico/.local/lib/python3.8/site-packages/fiftyone/core/odm/database.py", line 200, in establish_db_conn
    port = _db_service.port
  File "/home/nico/.local/lib/python3.8/site-packages/fiftyone/core/service.py", line 276, in port
    return self._wait_for_child_port()
  File "/home/nico/.local/lib/python3.8/site-packages/fiftyone/core/service.py", line 170, in _wait_for_child_port
    return find_port()
  File "/home/nico/.local/lib/python3.8/site-packages/retrying.py", line 56, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)
  File "/home/nico/.local/lib/python3.8/site-packages/retrying.py", line 266, in call
    raise attempt.get()
  File "/home/nico/.local/lib/python3.8/site-packages/retrying.py", line 301, in get
    six.reraise(self.value[0], self.value[1], self.value[2])
  File "/home/nico/.local/lib/python3.8/site-packages/six.py", line 719, in reraise
    raise value
  File "/home/nico/.local/lib/python3.8/site-packages/retrying.py", line 251, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  File "/home/nico/.local/lib/python3.8/site-packages/fiftyone/core/service.py", line 168, in find_port
    raise ServiceListenTimeout(etau.get_class_name(self), port)
fiftyone.core.service.ServiceListenTimeout: fiftyone.core.service.DatabaseService failed to bind to port
nico@nico-Legion:~/Documents/Capra_Projects/voxel51_fiftyone$ `

A "fun" thing about this error is that if I now reboot my machine, I get to do the same all over again. Then I can run the $ python3 run_51.py in the endless loop or I can run the script in debug mode one time successfully again, before getting the error.

As this happens on my local laptop (running ubuntu 20.04) and not on my remote machine, this is what leads me to think that the issue is related to how the MongoDB instance is created/closed, when running in debug mode in vscode. Also because a reboot of my computer resolves the issue. Again, sorry for the misleading with regards to my remote machine...