ioos / ioos_metrics

Working on creating metrics for the IOOS by the numbers
https://ioos.github.io/ioos_metrics/
MIT License
2 stars 4 forks source link

Make mbon_stats response easier to work with #79

Open MathewBiddle opened 5 months ago

MathewBiddle commented 5 months ago

@ocefpaf, @laurabrenskelle and I added an mbon_stats function, during the code sprint, to go out and grab statistics about dataset usage from two APIs. The response is a big data frame with some nested dictionaries. I think we're collecting all the data we need, but now it's a matter of being able to parse and use the response.

I'm curious if you could take a look at the function and see what we can do to make it easier to work with the data. As it stands we have to do row wise iteration to split out the GBIF and obis download information and that seems overburdening.

A generic question I have is how much do you have the function do vs how much do you do data wrangling in the use case notebook?

Any advice is appreciated!

ocefpaf commented 5 months ago

As it stands we have to do row wise iteration to split out the GBIF and obis download information and that seems overburdening.

Do you mean after the table is created? Or to create the table? If the former I believe the GBIF and OBIS are in different columns and no looping over rows is necessary. If I'm mistaken, maybe we need to create two tables instead of one in the function.

A generic question I have is how much do you have the function do vs how much do you do data wrangling in the use case notebook?

That is a good question! Hard to answer without knowing what people will be doing with that table. I like to keep the functions doing a bare minimum and leave the data wrangling part to the end user, b/c that will change more often. In order to implement that we need to identify this minimum table that would empower users to get what they want more easily. (That may lead to the create of multiple functions, easier to maintain, like fetch_obis, fetch_gbif, merge_tables, summary_table, etc).

MathewBiddle commented 2 months ago

We also have institution identifiers for some of the RAs. It might be nice to do something additional for these, or to be able to query for them as well.

MathewBiddle commented 2 months ago

NERACOOS and SECOORA have OceanExpert IDs but no OBIS institute pages. Hopefully @sformel-usgs can help sort that out.

sformel-usgs commented 2 months ago

Answer from Pieter:

for the institution landing page to work, dataset contacts need to matched to the respective OceanExpert institutions (unless the exact same contact has been matched before). Unfortunately the tool to do that is not functioning at the moment and needs some work. I'm afraid all I can do right now is manually link contacts to institutions if someone provides me a list of datasets.

MathewBiddle commented 2 months ago

@sformel-usgs thanks for digging into this! Is the tool something open source that we can help fix? I don't know what I'm looking for when I browse to that link and log in with OceanExpert.

sformel-usgs commented 2 months ago

I'm not sure either. I use that tool to update the US node info, but I'm not sure what the back end looks like. I'm guessing from my own permission that we would have to get additional permissions to manage specific organizations and datasets. I can bring this up when I'm in Belgium since I'll be in the building with the OBIS and OceanExpert people.