Open sethvincent opened 5 years ago
Here's are some suggestions:
stationHeight
changed to stationAltitude
(yup, agree in meters - and BTW, I was the one who originally suggested stationHeight
in #4, so my bad! The reason I suggest the change is that we wouldn't want folks to assume we are talking about station inlet height relative to the ground rather than the elevation of the instrument.)
maintenance
changed to maintenanceFrequency
(I like your suggestion of having a few set options, and the description might require something like, "Approximate scheduled maintenance frequency." Daily, weekly, monthly, yearly, as needed, not maintained, other" should encompass most possibilities (anyone have thoughts on that?)
For calibrationProcedures
I think we could get lots of variation in length and substance of the descriptions. Is it a problem to have some folks write a page of text in a field? If it is, I will think on how we could structure it to give shorter answers (and would love to hear feedback from others with ideas).
For siteDescription:
I think the unusual weather type would be hard to classify (and folks could pull in a fair amount of that kind of info from meteo open data sources). But it would be useful to have them classify stuff like: rural, urban, residential, roadside, industrial).
Edit: Anyone checking out this issue thread: also feel free to check Issue #8, which just came in with some feedback that has not been integrated yet into this specific issue thread (though also note, some of the suggestions are already captured in our existing data format. That said, we may want to consider having several of those existing data parameters in our metadata format too. For instance, you can imagine someone may report the existence of an existing station but we may not have pollution measurements. It'd still be valuable to have the coordinates, etc.
I don't have much input on what specific metadata to be included, but a more general comment. I am generally all for 'simple is better' which has served us well I think for the measurement data format but I would like to point out that here we are down the line creating a new standard to store other data. The point being that if we had stored more information in the original measurement format, would we need to do this work now? It may be the case that the measurement and data formats would always be separate, but just wanted to point out that excluding things now may lead to more work down the line.
Also, if we're going to duplicate data between the measurement and station data, we should think about how syncing will work. Which source is the source of truth? If data is updated in one place, how does it get updated in the other?
@jflasher, this is a good point on perhaps amping up the metadata format.
Input mentioned by Robert Rohde in Issue #8 also made me think we probably should duplicate (and pull in for existing station locations) several of the metadata already in the measurement data format.
The reason being: It'd be nice to have all metadata information about a given location - or the suite of all locations - not split between two systems: the metadata format and the data format.
For instance, imagine this scenario: Someone wants to add station data for monitors in Ghana via the metadata editor. Perhaps all they can add are the pollutant types and the coordinates of the station. It'd be a shame if we couldn't capture that. It'd also seem a shame if someone else who wants to find out what stations exist in Ghana has to know to ping both the regular API and the metadata API to get the full set of information, no? Are there other thoughts on this?
In the following comment, I'll post an updated format.
Here's a proposed metadata format incorporating metadata parameters from our existing data format and the changes others and I suggest above, as well as taking @jflasher's comment to heart about perhaps for once we do something in not the simplest format possible. :)
A description of the working metadata format provided by the OpenAQ Platform.
Field | Type | Required | Description | Comment/Q |
---|---|---|---|---|
stationID | Number | ✓ | Assigned by OpenAQ | https://github.com/openaq/project-universal-stationID/issues |
stationName | String | ✓ | Unique location name of the station | This is pulled from location in the existing data format and is the originating source-designated name. Seems like a good idea if we are to have station ID we should have a stationName . |
stationPollutants | String | ✓ | The measured parameter; acceptable values are pm25, pm10, co, bc, so2, no2, o3 |
Stations will often measure more than one type of pollutant. This info is already included for stations in the system in our existing measurement data format. Also: Shall we include more pollutant types than what the we currently ingest in OpenAQ? CO2, CH4, SOx, benzene? |
city | String | ✓ | City (or regional approximation) containing location | This info is already included for stations in the system in our existing measurement data format Do we want to keep this in the metadata format? I know there was some controversy. @jflasher |
stationAltitude | Geospatial altitude of station coordinates in meters | |||
country | String | ✓ | Country containing location in two letter ISO format | This info is already included for stations in the system in our existing measurement data format. |
sourceType | String | ✓ | The type of source; acceptable values are: government, research, other |
This info is already included for stations in the system in our existing measurement data format. |
coordinates | Object | ✓ | Location of measurement | This info is already included for stations in the system in our existing measurement data format. |
attribution | Array | ✓ | Data attribution in descending order of prominence | [{"name": "TCEQ", "url":"http://www.tceq.state.tx.us"}, {"name": "City of Houston Health Department"}] |
mobile | Boolean | ✓ | Indicates whether the measuring station is stationary or mobile | Should we keep this? I think so. |
instrumentNumber | Number | Number of instruments registered in the OpenAQ system to this station | Comments on this? Basically, we need a way to label multiple instruments with different metadata of their own and measuring multiple pollutants at a given station. | |
stationStart | Object | When did station first begin operating, if known?, ISO timestamp | ||
stationActive | String | True, False, Unknown options. Is the station still active? | ||
deactivatedStationDate | Object | If the station is no longer active, when did the station stop operating?, ISO timestamp | ||
otherStationNotes | String | Any other relevant notes about this station? |
For a value of 'n ' retrieved from instrumentNumber
= n, a corresponding number of instrument[n]
fields need to be created. In each instrument field, the following instrument metadata are requested:
Field | Type | Required | Description | Comment/Q |
---|---|---|---|---|
instrumentPollutants | String | The pollutant parameters measured by the instrument; acceptable values are pm25, pm10, co, bc, so2, no2, o3 |
Similar to question in the above Station-Level: Do we want to make it possible to include other pollutant types? | |
instrumentType | String | We could come up with a list of possibilities, but I'm tempted to see what would come in and develop an 'options' list from that? Seem like a bad idea to anyone? | ||
instrumentSerialNumber | String | Provides unique ID at the station level. | This can act as a unique ID for the instrument. | |
instrumentManufacturer | String | |||
modelName | String | |||
rawFrequency | Number | The raw sampling frequency of the instrument (e.g. min, sec, hr, day) | I think this will be the same for different pollutants measured by the same instrument. Feedback from others who may disagree?? | |
reportingFrequency | Number | The reporting sampling frequency of the instrument (e.g. min, sec, hr, day) | I think this will be the same for different pollutants measured by the same instrument. Feedback from others who may disagree? | |
measurementStyle | String | Automated, Manual, Unknown | ||
calibrationProcedures | String | This would be specific to the instrument. | This, left open-ended (which I think it needs to be), will likely have a wide variation of text lengths input by the particular editor and the particular station. | |
inletHeight | number | Height of intake inlet, if known, in meters. | ||
installationDate | Object | Installation data for instrument. ISO timestamp | ||
instrumentActive | String | True, False, Unknown options. Is the instrument still active? | ||
deactivatedInstrumentDate | Object | If the instrument has been deactivated, what date did this occur? | ||
otherInstrumentNotes | String | Any other relevant notes about this instrument? |
Field | Type | Required | Description | Comment/Q |
---|---|---|---|---|
input[x]Date | Object | A record of entries added on data and history of edits, where x is the x -th edit) | I am unclear if we can do this or if this should be listed as parameter in this format. Basically, would want some sort of ability to version. | |
inputAuthor[x] | String | Who added this entry? | Could this contain contact info, like an email address | |
notesByAuthor[x] | String | Any other relevant notes to add about this station or instruments therein? |
Edit: Forgot to add 'stationAltitude', just added.
Great, this expanded list of fields makes a lot of sense.
A few small suggestions:
mobile
The notion of a mobile station is super interesting and has implications for how we identify stations and assign the unique station ids. No suggestions on this really but something we'll want to keep in mind for figuring out station uniqueness. @olafveerman
instrumentNumber
If we store instruments as an array we can get the length of the array instead of tracking the instrument count in a field.
I'd be tempted to drop instrument
and station
from the field names as I think they will be in separate objects and that will help keep them short.
For example instead of stationID
, stationName
, instrumentType
, and instrumentSerialNumber
, it could look like:
{
id: 'stationid'
name: 'station name',
instruments: [
{
type: 'instrument type',
serialNumber: 'serial number'
}
]
}
For fields like input[_x_]Date
, inputAuthor[_x_]
, and notesByAuthor[_x_]
, if we want to track the changes we would likely make this a separate table that tracks information like what these fields would contain and the diff between the old and updated fields that were changed.
Alternately, we could start with a simpler couple of fields like updateDate
, the date of the last update, and updateAuthor
, the author of the last update.
Great idea. I would include the calculated "uncertainty of measurement" for each pollutant. This could replace the calibration procedure. It's also useful to know whether meteorology is measured on site and, if so, what parameters. In some applications the instruments are moved around so this might need a history associated with the instrument array. It's useful to know but the data should be instrument-agnostic.
Based on comments in this repo so far I've put together a quick sketch of what a metadata object might look like.
It's annotated with the links for relevant issues in this repo.
I'll update this issue with revisions as we discuss and decide on the specifics of the various attributes.