Closed ocefpaf closed 6 months ago
@MathewBiddle this one is getting too big to review. I'll address the remaining points in another PR.
HF-Radar is hardcoded 😢 . At one point you could parse the information from http://hfrnet.ucsd.edu/sitediag/stationList.php, but that doesn't seem to be the case anymore. Let's leave it hardcoded for now and update it once we have a source.
Is this ready for review?
HF-Radar is hardcoded 😢 . At one point you could parse the information from http://hfrnet.ucsd.edu/sitediag/stationList.php, but that doesn't seem to be the case anymore. Let's leave it hardcoded for now and update it once we have a source.
OK. I'll add a note to check hfrnet again in the future.
Is this ready for review?
Yep. I have some extra changes that would be nice in a fresh PR to avoid clashing with the ones here.
PS: The next changes parallelize things. It takes ~7 s against +20 s from before. The more metrics we add, the more the speedup will be important (we are still missing the national platforms and that hits different data sources).
In [2]: %time update_metrics()
CPU times: user 88.1 ms, sys: 86.3 ms, total: 174 ms
Wall time: 6.59 s
Out[2]:
date_UTC Federal Partners Regional Associations HF Radar Stations NGDAC Glider Days ... QARTOD Manuals IOOS Core Variables Metadata Records IOOS COMT Projects
0 2018-02-01 17 11 150 52027 ... 13 34 8600 1 <NA>
1 2022-04-22 17 11 165 53672 ... 13 34 7213 1 5
2 2022-07-08 17 11 165 55448 ... 13 34 6217 1 5
3 2022-10-05 17 11 165 59088 ... 13 34 24499 1 5
4 2023-01-05 17 11 165 62042 ... 13 34 11840 1 5
5 2024-02-14 17 11 <NA> 76075 ... 13 34 35249 1 5
[6 rows x 16 columns]
Just ran this and received this error:
import ioos_metrics.ioos_metrics
df2 = ioos_metrics.ioos_metrics.update_metrics()
Traceback (most recent call last):
File "C:\Users\Mathew.Biddle\programs\Miniforge\envs\ioos-metrics\Lib\site-packages\IPython\core\interactiveshell.py", line 3505, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-5-da9d358039b7>", line 1, in <module>
df2 = ioos_metrics.ioos_metrics.update_metrics()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Mathew.Biddle\Documents\GitProjects\ioos_metrics\ioos_metrics\ioos_metrics.py", line 429, in update_metrics
message = _compare_metrics(column=column, num=num)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Mathew.Biddle\Documents\GitProjects\ioos_metrics\ioos_metrics\ioos_metrics.py", line 65, in _compare_metrics
elif num < old:
^^^^^^^^^
TypeError: '>' not supported between instances of 'int' and 'NoneType'
It looks like, since HAB Pilot Projects doesnt exist previously, this catches the if loop.
I added some print statements to help debug:
df2 = ioos_metrics.ioos_metrics.update_metrics()
column: ATN Deployments
old: 4444
num: 5298
column: COMT Projects
old: 5
num: 5
column: Federal Partners
old: 17
num: 17
column: HAB Pilot Projects
old: 9
num: None
Traceback (most recent call last):
File "C:\Users\Mathew.Biddle\programs\Miniforge\envs\ioos-metrics\Lib\site-packages\IPython\core\interactiveshell.py", line 3505, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-3-da9d358039b7>", line 1, in <module>
df2 = ioos_metrics.ioos_metrics.update_metrics()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Mathew.Biddle\Documents\GitProjects\ioos_metrics\ioos_metrics\ioos_metrics.py", line 432, in update_metrics
message = _compare_metrics(column=column, num=num)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Mathew.Biddle\Documents\GitProjects\ioos_metrics\ioos_metrics\ioos_metrics.py", line 68, in _compare_metrics
elif num < old:
^^^^^^^^^
TypeError: '>' not supported between instances of 'int' and 'NoneType'
ahh, it looks like its a problem with hab_pilot_projects()
ioos_metrics.ioos_metrics.hab_pilot_projects()
Traceback (most recent call last):
File "C:\Users\Mathew.Biddle\programs\Miniforge\envs\ioos-metrics\Lib\site-packages\IPython\core\interactiveshell.py", line 3505, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-5-a4a8446718c8>", line 1, in <module>
ioos_metrics.ioos_metrics.hab_pilot_projects()
File "C:\Users\Mathew.Biddle\Documents\GitProjects\ioos_metrics\ioos_metrics\ioos_metrics.py", line 379, in hab_pilot_projects
from pdfminer.high_level import extract_text
File "C:\Users\Mathew.Biddle\programs\PyCharm Community Edition 2020.2.2\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
module = self._system_import(name, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named 'pdfminer.high_level'
Okay, I updated pdfminer
and now the HABs function works.
ioos_metrics.ioos_metrics.update_metrics()
was broken too as I need to install ckanapi
.
Looks like my env was all out of date. Updating my env then I'll try again.
conda env update --file environment.yml --prune
I guess I could more gracefully when a dependency is missing. Let me see if I can fix those.
@MathewBiddle latest commit should make the update_metrics run even when there is a missing dependency. Note that, b/c we want it to run all the way to the end, the metric will be None but the error will be in the logs like:
INFO:root:[2023-01-05] : COMT Projects equal 5 = 5.
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): ioos.noaa.gov:443
DEBUG:urllib3.connectionpool:https://ioos.noaa.gov:443 "GET /community/national HTTP/1.1" 301 0
DEBUG:urllib3.connectionpool:https://ioos.noaa.gov:443 "GET /community/national/ HTTP/1.1" 200 None
INFO:root:df_fed_partners[0].to_string()='0 National Oceanic and Atmospheric Administratio...\n1 National Aeronautics and Space Administration ...\n2 Bureau of Ocean Energy Manage
ment, Regulation ...\n3 Office of Naval Research (ONR)\n4 U.S. Army Corps of Engineers (USACE)\n5 U.S. Geological Survey (USGS)
\n6 Department of Energy (DOE)\n7 Department of Transportation (DOT)\n8 U.S. Arctic Research Commission (USARC)\n9
National Science Foundation (NSF)\n10 Environmental Protection Agency (EPA)\n11 Marine Mammal Commission (MMC)\n12 Oceanographer of the Navy, repre
senting the Jo...\n13 U.S. Coast Guard (USCG)\n14 Department of Agriculture, Cooperative State R...\n15 Department of State (DOS)\n1
6 Food and Drug Administration (FDA)'
INFO:root:[2023-01-05] : Federal Partners equal 17 = 17.
ERROR:root:No module named 'pdfminer'
I added some print statements to help debug
Matt, I should mention that update_metrics
never fails, it keeps going and logs everyting in the metric.log
file. You can either inspect the logs, to figure out why some metric is None, or run the specific function by itself. Here is what happens if I run hab_pilot_projects
outside of update_metrics
without pdfminer.six
:
from ioos_metrics.ioos_metrics import hab_pilot_projects
hab_pilot_projects()
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[2], line 1
----> 1 hab_pilot_projects()
File ~/Dropbox/pymodules/01-forks/IOOS/ioos_metrics/ioos_metrics/ioos_metrics.py:378, in hab_pilot_projects()
368 def hab_pilot_projects():
369 """
370 These are the National Harmful Algal Bloom Observing Network Pilot Project awards.
371 Currently these were calculated from the
(...)
376
377 """
--> 378 from pdfminer.high_level import extract_text
380 url = "https://cdn.ioos.noaa.gov/media/2022/10/NHABON-Funding-Awards-FY22.pdf"
382 data = requests.get(url)
ModuleNotFoundError: No module named 'pdfminer'
After updating the env things are looking good.
It looks like pdfminer
writes a lot of stuff to the log file. (166082 lines worth) We can clean that up in a further PR.
missing
TODO (moved to #56):
update_metrics