ioos / ioos_metrics

Working on creating metrics for the IOOS by the numbers
https://ioos.github.io/ioos_metrics/
MIT License
2 stars 4 forks source link

More metrics #54

Closed ocefpaf closed 6 months ago

ocefpaf commented 7 months ago

TODO (moved to #56):

ocefpaf commented 7 months ago

@MathewBiddle this one is getting too big to review. I'll address the remaining points in another PR.

MathewBiddle commented 7 months ago

HF-Radar is hardcoded 😢 . At one point you could parse the information from http://hfrnet.ucsd.edu/sitediag/stationList.php, but that doesn't seem to be the case anymore. Let's leave it hardcoded for now and update it once we have a source.

MathewBiddle commented 7 months ago

Is this ready for review?

ocefpaf commented 7 months ago

HF-Radar is hardcoded 😢 . At one point you could parse the information from http://hfrnet.ucsd.edu/sitediag/stationList.php, but that doesn't seem to be the case anymore. Let's leave it hardcoded for now and update it once we have a source.

OK. I'll add a note to check hfrnet again in the future.

Is this ready for review?

Yep. I have some extra changes that would be nice in a fresh PR to avoid clashing with the ones here.

ocefpaf commented 7 months ago

PS: The next changes parallelize things. It takes ~7 s against +20 s from before. The more metrics we add, the more the speedup will be important (we are still missing the national platforms and that hits different data sources).

In [2]: %time update_metrics()
CPU times: user 88.1 ms, sys: 86.3 ms, total: 174 ms
Wall time: 6.59 s
Out[2]: 
     date_UTC Federal Partners Regional Associations  HF Radar Stations NGDAC Glider Days  ...  QARTOD Manuals IOOS Core Variables Metadata Records IOOS COMT Projects
0  2018-02-01               17                    11                150             52027  ...              13                  34             8600    1          <NA>
1  2022-04-22               17                    11                165             53672  ...              13                  34             7213    1             5
2  2022-07-08               17                    11                165             55448  ...              13                  34             6217    1             5
3  2022-10-05               17                    11                165             59088  ...              13                  34            24499    1             5
4  2023-01-05               17                    11                165             62042  ...              13                  34            11840    1             5
5  2024-02-14               17                    11               <NA>             76075  ...              13                  34            35249    1             5

[6 rows x 16 columns]
MathewBiddle commented 6 months ago

Just ran this and received this error:

import ioos_metrics.ioos_metrics
df2 = ioos_metrics.ioos_metrics.update_metrics()
Traceback (most recent call last):
  File "C:\Users\Mathew.Biddle\programs\Miniforge\envs\ioos-metrics\Lib\site-packages\IPython\core\interactiveshell.py", line 3505, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-5-da9d358039b7>", line 1, in <module>
    df2 = ioos_metrics.ioos_metrics.update_metrics()
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Mathew.Biddle\Documents\GitProjects\ioos_metrics\ioos_metrics\ioos_metrics.py", line 429, in update_metrics
    message = _compare_metrics(column=column, num=num)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Mathew.Biddle\Documents\GitProjects\ioos_metrics\ioos_metrics\ioos_metrics.py", line 65, in _compare_metrics
    elif num < old:
         ^^^^^^^^^
TypeError: '>' not supported between instances of 'int' and 'NoneType'
MathewBiddle commented 6 months ago

It looks like, since HAB Pilot Projects doesnt exist previously, this catches the if loop.

I added some print statements to help debug:

df2 = ioos_metrics.ioos_metrics.update_metrics()
column: ATN Deployments
old: 4444
num: 5298
column: COMT Projects
old: 5
num: 5
column: Federal Partners
old: 17
num: 17
column: HAB Pilot Projects
old: 9
num: None
Traceback (most recent call last):
  File "C:\Users\Mathew.Biddle\programs\Miniforge\envs\ioos-metrics\Lib\site-packages\IPython\core\interactiveshell.py", line 3505, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-3-da9d358039b7>", line 1, in <module>
    df2 = ioos_metrics.ioos_metrics.update_metrics()
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Mathew.Biddle\Documents\GitProjects\ioos_metrics\ioos_metrics\ioos_metrics.py", line 432, in update_metrics
    message = _compare_metrics(column=column, num=num)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Mathew.Biddle\Documents\GitProjects\ioos_metrics\ioos_metrics\ioos_metrics.py", line 68, in _compare_metrics
    elif num < old:
         ^^^^^^^^^
TypeError: '>' not supported between instances of 'int' and 'NoneType'
MathewBiddle commented 6 months ago

ahh, it looks like its a problem with hab_pilot_projects()

ioos_metrics.ioos_metrics.hab_pilot_projects()
Traceback (most recent call last):
  File "C:\Users\Mathew.Biddle\programs\Miniforge\envs\ioos-metrics\Lib\site-packages\IPython\core\interactiveshell.py", line 3505, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-5-a4a8446718c8>", line 1, in <module>
    ioos_metrics.ioos_metrics.hab_pilot_projects()
  File "C:\Users\Mathew.Biddle\Documents\GitProjects\ioos_metrics\ioos_metrics\ioos_metrics.py", line 379, in hab_pilot_projects
    from pdfminer.high_level import extract_text
  File "C:\Users\Mathew.Biddle\programs\PyCharm Community Edition 2020.2.2\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named 'pdfminer.high_level'
MathewBiddle commented 6 months ago

Okay, I updated pdfminer and now the HABs function works.

MathewBiddle commented 6 months ago

ioos_metrics.ioos_metrics.update_metrics() was broken too as I need to install ckanapi.

MathewBiddle commented 6 months ago

Looks like my env was all out of date. Updating my env then I'll try again.

conda env update --file environment.yml --prune
ocefpaf commented 6 months ago

I guess I could more gracefully when a dependency is missing. Let me see if I can fix those.

ocefpaf commented 6 months ago

@MathewBiddle latest commit should make the update_metrics run even when there is a missing dependency. Note that, b/c we want it to run all the way to the end, the metric will be None but the error will be in the logs like:

INFO:root:[2023-01-05] : COMT Projects equal 5 = 5.
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): ioos.noaa.gov:443
DEBUG:urllib3.connectionpool:https://ioos.noaa.gov:443 "GET /community/national HTTP/1.1" 301 0
DEBUG:urllib3.connectionpool:https://ioos.noaa.gov:443 "GET /community/national/ HTTP/1.1" 200 None
INFO:root:df_fed_partners[0].to_string()='0     National Oceanic and Atmospheric Administratio...\n1     National Aeronautics and Space Administration ...\n2     Bureau of Ocean Energy Manage
ment, Regulation ...\n3                        Office of Naval Research (ONR)\n4                  U.S. Army Corps of Engineers (USACE)\n5                         U.S. Geological Survey (USGS)
\n6                            Department of Energy (DOE)\n7                    Department of Transportation (DOT)\n8               U.S. Arctic Research Commission (USARC)\n9                 
    National Science Foundation (NSF)\n10                Environmental Protection Agency (EPA)\n11                       Marine Mammal Commission (MMC)\n12    Oceanographer of the Navy, repre
senting the Jo...\n13                              U.S. Coast Guard (USCG)\n14    Department of Agriculture, Cooperative State R...\n15                            Department of State (DOS)\n1
6                   Food and Drug Administration (FDA)'
INFO:root:[2023-01-05] : Federal Partners equal 17 = 17.
ERROR:root:No module named 'pdfminer'
ocefpaf commented 6 months ago

I added some print statements to help debug

Matt, I should mention that update_metrics never fails, it keeps going and logs everyting in the metric.log file. You can either inspect the logs, to figure out why some metric is None, or run the specific function by itself. Here is what happens if I run hab_pilot_projects outside of update_metrics without pdfminer.six:

from ioos_metrics.ioos_metrics import hab_pilot_projects

hab_pilot_projects()
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[2], line 1
----> 1 hab_pilot_projects()

File ~/Dropbox/pymodules/01-forks/IOOS/ioos_metrics/ioos_metrics/ioos_metrics.py:378, in hab_pilot_projects()
    368 def hab_pilot_projects():
    369     """
    370     These are the National Harmful Algal Bloom Observing Network Pilot Project awards.
    371     Currently these were calculated from the
   (...)
    376 
    377     """
--> 378     from pdfminer.high_level import extract_text
    380     url = "https://cdn.ioos.noaa.gov/media/2022/10/NHABON-Funding-Awards-FY22.pdf"
    382     data = requests.get(url)

ModuleNotFoundError: No module named 'pdfminer'
MathewBiddle commented 6 months ago

After updating the env things are looking good.

It looks like pdfminer writes a lot of stuff to the log file. (166082 lines worth) We can clean that up in a further PR.

MathewBiddle commented 6 months ago

missing