After investigating the perf problems in the GEOMETransformer, I found that we were querying against the kingdom column, which wasn't indexed. After investigating the data I determined this query wasn't necessary and removed the code.
Added some timer decorators for future use.
INFO:root:Function 'get_thing_with_id' executed in 0.0075s at 1718731437.179872
INFO:root:Function 'thing_resolved_content' executed in 0.0000s at 1718731437.18012
INFO:root:Function 'geome_transformer_for_identifier' executed in 0.0000s at 1718731437.180233
INFO:root:Function 'has_context_categories' executed in 10.1678s at 1718731447.348138
INFO:root:Function 'has_material_categories' executed in 0.0000s at 1718731447.348245
INFO:root:Function 'has_specimen_categories' executed in 0.0000s at 1718731447.3482869
INFO:root:Function 'parse_permit_freetext' executed in 0.0002s at 1718731447.3486311
INFO:root:Function 'transform' executed in 10.1684s at 1718731447.348671
Narrowed it down to INFO:root:Function 'kingdom_for_taxonomy_name' executed in 10.0508s at 1718731797.169311
In looking at the data, all of the kingdoms are represented by the name=kingdom, so we only need to query against kingdom.
isb_3=> select * from taxonomyname where name=kingdom;
_id | name | kingdom
--------+----------------+----------------
3997 | Protozoa | Protozoa
3998 | Protozoa | Protozoa
13769 | incertae sedis | incertae sedis
13770 | incertae sedis | incertae sedis
25219 | Chromista | Chromista
25220 | Chromista | Chromista
25221 | Plantae | Plantae
25222 | Plantae | Plantae
30427 | Archaea | Archaea
30428 | Archaea | Archaea
51957 | Animalia | Animalia
42585 | Viruses | Viruses
42586 | Viruses | Viruses
45869 | Fungi | Fungi
45870 | Fungi | Fungi
47419 | Bacteria | Bacteria
47420 | Bacteria | Bacteria
51958 | Animalia | Animalia
76147 | Plantae | Plantae
76148 | Plantae | Plantae
127950 | Animalia | Animalia
119110 | Animalia | Animalia
120130 | Animalia | Animalia
111582 | Animalia | Animalia
113408 | Animalia | Animalia
113968 | Animalia | Animalia
116164 | Animalia | Animalia
117622 | Animalia | Animalia
128100 | Animalia | Animalia
129362 | Animalia | Animalia
122794 | Animalia | Animalia
124512 | Animalia | Animalia
125150 | Animalia | Animalia
(33 rows)
After investigating the perf problems in the GEOMETransformer, I found that we were querying against the
kingdom
column, which wasn't indexed. After investigating the data I determined this query wasn't necessary and removed the code.Added some timer decorators for future use.
Narrowed it down to
INFO:root:Function 'kingdom_for_taxonomy_name' executed in 10.0508s at 1718731797.169311
In looking at the data, all of the kingdoms are represented by the name=kingdom, so we only need to query against kingdom.