openaq / openaq-data-format

A description of the data format provided by the OpenAQ platform.
MIT License
30 stars 4 forks source link

Discussion on data reqs for 'research' and 'other' sourceTypes #14

Open RocketD0g opened 8 years ago

RocketD0g commented 8 years ago

Moving over a discussion from @masalmon on Slack to here:

"Besides these R package questions, I am wondering how much metadata there will be for accompanying research data? Moreover, when monitoring AQ in rural locations research studies also measure Temperature and RH and wind direction&speed that are really important, and they could not be published in OpenAQ at the same time because OpenAQ does not have T and RH. Another variable that would be interesting to have is the device used. I am writing this from a data manager point of view: I have no idea whether the data of my research project will be made public (it's not for me to decide) but I can say that we have AQ and meteorological variables in rural locations. The AQ data without weather data would be poorer and it'd be very hard to get weather data from elsewhere for these locations since they're rural. And in the metadata+in articles we'll give references of the devices. With official sources the information about devices is not always there but for research it should be available anyway. Just my two cents."

RocketD0g commented 8 years ago

I think one issue with meta data we will have is that it will be hard for researchers to agree on a) what meta data to include and b) how to format it. In conversations to date, it seems hard for people in different places to converge on how to define terms, etc.

It leaves me, at the moment, thinking, it's best to minimize the metadata, but be sure people know how to connect with the originating source. I agree with you that RH, T, windspeed (and I'd put P and/or elevation in there too - and researchers have told us storing raw V would be useful too) are important for research and 'other' (e.g. low cost sensors) types - but I wonder the ​broad​ value (to not just researchers, but to journalists, policy folks, app developers, the public) of having that data available programmatically and stored on our systems is, as opposed to an individual researcher contacting an originating source and getting that additional data.

For us, it's always a struggle to find the right balance of providing the most useful data to the most people without making our system too unwieldy and expensive to maintain in the long-term. I don't know what data format will make the most sense for 'research-grade' and 'other' types, but I think one role we can help play is to connect users to the originating source, even when we can't host all of the auxiliary data that exists.

I want to hear thoughts from others! Anybody?

maelle commented 8 years ago

Well there are metadata standards 😉 I have also thought of another barrier, which is that contrary to a normal repository there would not be a DOI thus less impact factor benefits for researchers...

I am very interested in this discussion and I agree with you on impact, the more accessible the better it is. Although I am listing all downsides today 😀

RocketD0g commented 8 years ago

What do you mean by meta data standards? Which ones specifically? I don't know of one that has been broadly agreed up on for low-cost sensors, and it would be super useful for us to know about any and all that exist.

For low-cost sensors, I have seen people agree for the need of certain metadata categories - but also disagree on how to do things like: (1) defining measurement area type (e.g. urban,rural, industrial, broadly are used, but not agreement how to define these things), and (2) description of calibration procedures. Things like sensor type are super, super useful to include and seem easy enough to do, but have also heard arguments about whether exact model #'s should be included or not, or at least the year of manufacturing, etc.

maelle commented 8 years ago

Oh no I did not mean for sensors, but in general. :smile: For instance the EML standard for ecological data. It rather defines the format than the contents, though. Maybe for scientific data there would be a corresponding article?

In CHAI database metadata for instance I want to include model name and version number + year it was bought because I'm paranoid about what could be useful later.

I know no other database of monitoring devices than http://db-airmontech.jrc.ec.europa.eu/ (heard through Félix Pedrera) which is not exhaustive. Maybe OpenAQ could do this. :smile_cat:

maelle commented 8 years ago

Small summary

I think my questions are related to two concerns:

How to give credit

Science is currently funded by impact factors. Maybe there could be a tool on the API where one could input source names and get the corresponding citations on top of a general citation for the platform?

For use for non scientific purposes, the issue of not getting credit would not be bigger on OpenAQ than on other repositories. Although this could be problematic, because I guess a motivation for a researcher to put their data on OpenAQ would be the additional impact... So maybe providing download statistics per source would help?

maelle commented 8 years ago

And maybe OpenAQ could provide support and guidance for researchers that have air quality data: