ropensci-archive / isdparser

:warning: ARCHIVED :warning: NOAA ISD/ISH file parser
Other
13 stars 4 forks source link

feedback? #6

Closed sckott closed 7 years ago

sckott commented 7 years ago

@masalmon @mrubuayet @Fasle @rjbehnke @lukas-rokka @zxy1219

this is a new package only for parsing NOAA ISD files

its code taken from rnoaa, and rearranged, and now all additional data categories are accounted for.

please have a look and let me know if you see any problems, or have feature requests.

i aim to push this to CRAN very soon (tomorrow)

rjbehnke commented 7 years ago

Scott,

I tested it on a couple big files, and it seems to work fine. The parallel option is very nice (and necessary)! I'm taking my PhD comprehensive exams this week, but will test it out more as soon as I can.

Ruben


From: Scott Chamberlain [notifications@github.com] Sent: Tuesday, November 01, 2016 4:03 PM To: ropenscilabs/isdparser Cc: Behnke, Ruben; Mention Subject: [ropenscilabs/isdparser] feedback? (#6)

@masalmonhttps://github.com/masalmon @mrubuayethttps://github.com/mrubuayet @Faslehttps://github.com/Fasle @rjbehnkehttps://github.com/rjbehnke @lukas-rokkahttps://github.com/lukas-rokka @zxy1219https://github.com/zxy1219

this is a new package only for parsing NOAA ISD files

its code taken from rnoaa, and rearranged, and now all additional data categories are accounted for.

please have a look and let me know if you see any problems, or have feature requests.

i aim to push this to CRAN very soon (tomorrow)

� You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ropenscilabs/isdparser/issues/6, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AVFU22PyEwOhYuEOovAWrt9En2WReuPCks5q57chgaJpZM4Kmou5.

sckott commented 7 years ago

great, thanks for the feedback @rjbehnke - good to know parallel option is useful.

i think we can speed up the code with more profiling, and possibly dropping down to c++

lukas-rokka commented 7 years ago

Nice, @sckott.

Just want to mention a project at UK Met office, that relates to your rnoaa project, http://www.metoffice.gov.uk/hadobs/hadisd. The hadISD database is based on NOAA's ISD/ISH database, but have undergone quality checks. hadISD data is available as netCDF (makes life much easier). The code will be released as open source Python code anytime soon, can be interesting to see how they managed the parsing and quality checks.

maelle commented 7 years ago

Nice package! 2 comments:

sckott commented 7 years ago

@lukas-rokka Good to hear about that - I wasn't thinking of doing quality checks, but I suppose that'd be a good idea, will check it out once its opened up

sckott commented 7 years ago

@masalmon yes, will add more info to readme

at least in my head I see data.table as a lighter weight dependency, it installs faster, but i do prefer to use dplyr for munging, etc. - anyway, the point is we need a package that does just bind_rows/rbindlist (they work slightly differently, but same general task) - I've asked before, but have been told doesn't make sense to rip out bind_rows to sep. package. Maybe I'll askf about rbindlist to a sep. smaller pkg.

sckott commented 7 years ago

sent off to cran, thanks for the feedback all

rjbehnke commented 7 years ago

One more suggestion is to offer an option where only the basic variables are returned. One file I parsed had 365 variables, the vast majority of which are useless. Keeping only the main meteorological variables of temperature, precipitation, dew point, wind speed, wind direction, pressure, and if available, solar radiation, RH, soil temperature, and soil moisture (along with their QC flags) is good enough for 99% of users.


From: Scott Chamberlain [notifications@github.com] Sent: Wednesday, November 02, 2016 6:33 PM To: ropenscilabs/isdparser Cc: Behnke, Ruben; Mention Subject: Re: [ropenscilabs/isdparser] feedback? (#6)

sent off to cran, thanks for the feedback all

� You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ropenscilabs/isdparser/issues/6#issuecomment-258040240, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AVFU2--qIHqNz1FXAXIlzB6G446GhmZSks5q6SvJgaJpZM4Kmou5.

sckott commented 7 years ago

@rjbehnke good idea, can add a parameter that states what fields to return, thus avoiding parsing the others, and saves parsing time

rjbehnke commented 7 years ago

Scott,

I've been parsing data with the new isdparser package and you appear to be already accounting for the scaling factor in the data. Is that true? Also, there are some stations where the latitude, longitude, and elevation are bad. Are you taking care of those, as well? Lat and Lon appear to be bad when they're both equal to zero, but I haven't looked at a large sample. Elevation seems to be bad when its equal to 999.9. It seems like the values in the parsed data for missing values are divided by 10, since the missing data values in the documentation are equal to 99, 999, etc., and the missing values in the parsed values are 999.9, 9999.9, etc.

I've looked at precipitation, and I'm not sure there's an easy way to parse this data out. Since this not one of the mandatory variables found in the documentation, there are often several precipitation variables in the file, and I don't know which one is correct, etc. Anyway, just wanted to suggest that you provide some documentation as to missing values, units, whether or not you're taking care of the scaling factor, etc. It will make using the package much easier.

Ruben


From: Scott Chamberlain [notifications@github.com] Sent: Thursday, November 03, 2016 9:30 AM To: ropenscilabs/isdparser Cc: Behnke, Ruben; Mention Subject: Re: [ropenscilabs/isdparser] feedback? (#6)

@rjbehnkehttps://github.com/rjbehnke good idea, can add a parameter that states what fields to return, thus avoiding parsing the others, and saves parsing time

� You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ropenscilabs/isdparser/issues/6#issuecomment-258176978, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AVFU27CtLgb29XNEL2kfX9LoJmKEdBuLks5q6f35gaJpZM4Kmou5.

sckott commented 7 years ago

@rjbehnke thanks for the feedback.

I've been parsing data with the new isdparser package and you appear to be already accounting for the scaling factor in the data. Is that true?

what is "scaling factor"?

Also, there are some stations where the latitude, longitude, and elevation are bad. Are you taking care of those, as well?

No, can you give some examples please?

It seems like the values in the parsed data for missing values are divided by 10, since the missing data values in the documentation are equal to 99, 999, etc., and the missing values in the parsed values are 999.9, 9999.9, etc.

Yes, see https://github.com/ropenscilabs/isdparser/blob/master/R/trans_vars.R Let me know if you think that's wrong.

I've looked at precipitation, and I'm not sure there's an easy way to parse this data out. Since this not one of the mandatory variables found in the documentation, there are often several precipitation variables in the file, and I don't know which one is correct, etc.

Can you please give an example?

Anyway, just wanted to suggest that you provide some documentation as to missing values, units, whether or not you're taking care of the scaling factor, etc. It will make using the package much easier.

Can add more documentation.

rjbehnke commented 7 years ago

Scaling factor is the factor the data are multiplied by, as stated in the ISD format document found at ftp://ftp.ncdc.noaa.gov/pub/data/noaa. It seems like you are accounting for this based on the code found at the link you show. The missing data codes (999, 9999, etc.) are whole numbers, not decimals, which is why I was wondering if you divided these by 10.

I don't have any example of bad latitude, longitude, and/or elevation in front of me now, but, when this happens, latitude and longitude are both zero, and elevation is 999.9. The code for missing lat/lon in the documentation is 99999.

Some stations seem to have a few precipitation sensors/measurements, like 999999-63867, which has

"AA1_precipitation_liquid",

"AA1_depth",

"AO1_precipitation_liquid",

"CG1_liquid_depth",

"CG2_liquid_depth",

"CG3_liquid_depth", and more.

I'm not sure which one is right and was wondering which one, if any, you're basing precipitation on.


From: Scott Chamberlain [notifications@github.com] Sent: Monday, November 07, 2016 1:18 PM To: ropenscilabs/isdparser Cc: Behnke, Ruben; Mention Subject: Re: [ropenscilabs/isdparser] feedback? (#6)

@rjbehnkehttps://github.com/rjbehnke thanks for the feedback.

I've been parsing data with the new isdparser package and you appear to be already accounting for the scaling factor in the data. Is that true?

what is "scaling factor"?

Also, there are some stations where the latitude, longitude, and elevation are bad. Are you taking care of those, as well?

No, can you give some examples please?

It seems like the values in the parsed data for missing values are divided by 10, since the missing data values in the documentation are equal to 99, 999, etc., and the missing values in the parsed values are 999.9, 9999.9, etc.

Yes, see https://github.com/ropenscilabs/isdparser/blob/master/R/trans_vars.R Let me know if you think that's wrong.

I've looked at precipitation, and I'm not sure there's an easy way to parse this data out. Since this not one of the mandatory variables found in the documentation, there are often several precipitation variables in the file, and I don't know which one is correct, etc.

Can you please give an example?

Anyway, just wanted to suggest that you provide some documentation as to missing values, units, whether or not you're taking care of the scaling factor, etc. It will make using the package much easier.

Can add more documentation.

� You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ropenscilabs/isdparser/issues/6#issuecomment-258950326, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AVFU27N4UMFRnABEQBkBottUnluNU6Ifks5q74eYgaJpZM4Kmou5.

sckott commented 7 years ago

okay, i see what you mean by scaling factor. We are accounting for scaling factor for the mandatory variables only I think. Even though some additional variables have scaling factors, we aren't accounting for those. Not sure if should automatically do scaling factor for all variables (mandatory and additional), or if should have that as an optional step after parsing data maybe, so that the time for initial parsing isn't increased any more, but if people want scaling taking into account, they can call a single fxn to do that

I'm not sure which one is right and was wondering which one, if any, you're basing precipitation on.

Not sure what you mean. Those 3 character acronyms are verbatim from the manual, and are all additional data sections. The variables without the 3 character acronynms are mandatory data fields, and those field names follow the manual as well. Where exactly is the confusion here?

rjbehnke commented 7 years ago

In the code link you sent me, there is a precipitation variable (along with the other variables) in the following line:

w$precipitation <- trans_var(trycol(suppressWarnings(w$precipitation)), 10)

This is the precipitation variable I'm asking about.

Ruben


From: Scott Chamberlain [notifications@github.com] Sent: Monday, November 07, 2016 1:18 PM To: ropenscilabs/isdparser Cc: Behnke, Ruben; Mention Subject: Re: [ropenscilabs/isdparser] feedback? (#6)

@rjbehnkehttps://github.com/rjbehnke thanks for the feedback.

I've been parsing data with the new isdparser package and you appear to be already accounting for the scaling factor in the data. Is that true?

what is "scaling factor"?

Also, there are some stations where the latitude, longitude, and elevation are bad. Are you taking care of those, as well?

No, can you give some examples please?

It seems like the values in the parsed data for missing values are divided by 10, since the missing data values in the documentation are equal to 99, 999, etc., and the missing values in the parsed values are 999.9, 9999.9, etc.

Yes, see https://github.com/ropenscilabs/isdparser/blob/master/R/trans_vars.R Let me know if you think that's wrong.

I've looked at precipitation, and I'm not sure there's an easy way to parse this data out. Since this not one of the mandatory variables found in the documentation, there are often several precipitation variables in the file, and I don't know which one is correct, etc.

Can you please give an example?

Anyway, just wanted to suggest that you provide some documentation as to missing values, units, whether or not you're taking care of the scaling factor, etc. It will make using the package much easier.

Can add more documentation.

� You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/ropenscilabs/isdparser/issues/6#issuecomment-258950326, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AVFU27N4UMFRnABEQBkBottUnluNU6Ifks5q74eYgaJpZM4Kmou5.

sckott commented 7 years ago

thanks for clarification, will have a look

sckott commented 7 years ago

closing this for now, please do open new issues for specific things as needed