iDigBio / idb-backend

iDigBio server and backend code for data ingestion, media processing, record indexing, and data API.
GNU General Public License v3.0
7 stars 1 forks source link

problematic behavior in indexer if Biodiversity parserver not responding #18

Open danstoner opened 7 years ago

danstoner commented 7 years ago

The idb indexer does not seem to have a timeout while connecting to the ruby Biodiversity parserver.

For example, when running:

idb index check

If the parserver is not responding, the indexer process gets stuck waiting "forever" for a response.

danstoner commented 6 years ago

I believe the behavior has changed from original report, possibly due to commit: b48cff25527b0ef6321e21d4631eea796eb2db22

I believe without the parserver (I cannot find where or how it is supposed to be running, but I think it is ruby code somewhere), on a machine where nothing is listening at localhost:4334, the code does not display an error but instead returns only empty results.

~/idb-backend/idb/helpers# ipython
Python 2.7.12 (default, Nov 20 2017, 18:23:56) 
Type "copyright", "credits" or "license" for more information.

IPython 5.1.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: from biodiversity_socket_connector import Biodiversity

In [2]: b = Biodiversity()

In [3]: b.get_genus_species("Puma concolor")
No handlers could be found for logger "idb.cfg"
Out[3]: {}

The logging handler issue appears to be masking the underlying error, which is failure to connect to socket server.

In [7]: b.get_genus_species("Puma concolor")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-7-c8b39bae692c> in <module>()
----> 1 b.get_genus_species("Puma concolor")

/home/dstoner/git/idb-backend/idb/helpers/biodiversity_socket_connector.py in get_genus_species(self, namestr)
     90 
     91     def get_genus_species(self, namestr):
---> 92         self._sendOne(namestr)
     93         return self._parseResp(self._recvOne())
     94 

/home/dstoner/git/idb-backend/idb/helpers/biodiversity_socket_connector.py in _sendOne(self, namestr)
     40 
     41     def _sendOne(self, namestr):
---> 42         if self.sock:
     43             try:
     44                 self.sock.send(namestr.encode("utf-8") + "\n")

/home/dstoner/git/idb-backend/idb/helpers/biodiversity_socket_connector.py in sock(self)
     27     def sock(self):
     28         if self._sock is None:
---> 29             self._connect()
     30         return self._sock
     31 

/home/dstoner/git/idb-backend/idb/helpers/biodiversity_socket_connector.py in _connect(self)
     21             self._sock.connect((self.host, self.port))
     22         except socket.error as e:
---> 23             logger.error("Biodiversity socket server unavailable: %s", e)
     24             self._sock = False
     25 

NameError: global name 'logger' is not defined

The missing socket connection no longer causes an indefinite block on index processes, but is the current behavior acceptable? Is the data from the socket actually needed?

danstoner commented 6 years ago

On the indexing / workflow server, the parserver service is not running and is not enabled... so indexing has been hitting this error for one while.

Cannot tell if this has affected indexing or not. The indexer process seems to run to completion just fine but is it doing all of the work it is supposed to be doing?

danstoner commented 6 years ago
# systemctl list-units | grep parserver

# systemctl status parserver
● parserver.service - Biodiversity parserver
   Loaded: loaded (/etc/systemd/system/parserver.service; disabled; vendor preset: enabled)
   Active: inactive (dead)

# systemctl start parserver

# systemctl status parserver
● parserver.service - Biodiversity parserver
   Loaded: loaded (/etc/systemd/system/parserver.service; disabled; vendor preset: enabled)
   Active: active (running) since Fri 2018-04-20 10:51:31 EDT; 1s ago
 Main PID: 15853 (parserver)
   CGroup: /system.slice/parserver.service
           └─15853 /usr/bin/ruby2.3 /usr/local/bin/parserver

Apr 20 10:51:31 c18node4 systemd[1]: Started Biodiversity parserver.

# systemctl list-unit-files | grep parserver
parserver.service                                    disabled

# systemctl stop parserver

# cat /etc/systemd/system/parserver.service 

[Unit]
Description=Biodiversity parserver

[Service]
User=biodiversity
WorkingDirectory=~
ExecStart=/usr/local/bin/parserver
Restart=always
RestartSec=2min

[Install]
WantedBy=multi-user.target
danstoner commented 6 years ago

Biodiversity() is used in conversions.py.

b = Biodiversity()

def genusSpeciesFiller(t, r):
    gs = b.get_genus_species(r["scientificname"])
    return gs

def gs_sn_crossfill(t, r):
    if filled("genus", r) and not filled("scientificname", r):
        r["scientificname"] = scientificNameFiller(t, r)
        r["flag_scientificname_added"] = True
    elif filled("scientificname", r) and not filled("genus", r):
        gs = genusSpeciesFiller(t, r)
        for k, indk in [("genus", "genus"), ("species", "specificepithet")]:
            if filled(k, gs) and not filled(indk, r):
                r[indk] = gs[k]
                r["flag_" + indk + "_added"] = True

There is currently no logging or debug logging in conversions for these functions. gs is most likely coming back as empty dict when Biodiversity / parserver is not actually running.

danstoner commented 6 years ago

Changed "b" variable to "bioserv" in c2d5991d5afdff9987d537acc41654b78b6a355c

$ find . -name "*.py" | xargs grep -C 1 bioserv
./idb/helpers/conversions.py-
./idb/helpers/conversions.py:bioserv = Biodiversity()
./idb/helpers/conversions.py-
--
./idb/helpers/conversions.py-def genusSpeciesFiller(t, r):
./idb/helpers/conversions.py:    gs = bioserv.get_genus_species(r["scientificname"])
./idb/helpers/conversions.py-    return gs