Closed mpinsky closed 8 years ago
Nice, thanks.
Are these issues that need to be fixed in the website code?
If so, could you create an issue there and link to the lines of the code that need the change?
If you can make these changes to the website code yourself, could you link that commit (commit of corrections to website code) in a comment on this issue (issue of cleaning temperatures in trawl repo)?
If you don't make and issue that links to line numbers or do the commit (that would allow me to see what pieces of code were changed), could you tell me which corrections are the new ones?
Also, as a general approach, rather than specifying the year and the month etc where an error exists, is there logic than can be applied that is more general? I.e., is there something specific about the temperature value itself that is flawed? E.g., if any temperature was ever below a certain value it should be NA, or if the value is way too cold for a region (e.g., if data is a data.table of trawl values, data[region=="gmex" & stemp < 5, stemp:=NA]
).
If it comes down to have a collection of manually-identified errors, we should format them into a 2D structure, save them as a .csv or .txt file, then right code to update the object based on the contents of that file. That way we have a single file that explicitly states the manual corrections we're making (easier to track), and then the code becomes less bloated.
Or, in the least, we could have a separate R script that executes some of the cleaning.
The OceanAdapt code doesn't deal with temperature (yet).
I don't believe there is any specific logic that could be used universally.
On Mon, Feb 2, 2015 at 9:02 AM, Ryan Batt notifications@github.com wrote:
Nice, thanks.
Are these issues that need to be fixed in the website code?
If so, could you create an issue there and link to the lines of the code that need the change?
If you can make these changes to the website code yourself, could you link that commit (commit of corrections to website code) in a comment on this issue (issue of cleaning temperatures in trawl repo)?
If you don't make and issue that links to line numbers or do the commit (that would allow me to see what pieces of code were changed), could you tell me which corrections are the new ones?
Also, as a general approach, rather than specifying the year and the month etc where an error exists, is there logic than can be applied that is more general? I.e., is there something specific about the temperature value itself that is flawed? E.g., if any temperature was ever below a certain value it should be NA, or if the value is way too cold for a region (e.g., if data is a data.table of trawl values, data[region=="gmex" & stemp < 5, stemp:=NA]).
If it comes down to have a collection of manually-identified errors, we should format them into a 2D structure, save them as a .csv or .txt file, then right code to update the object based on the contents of that file. That way we have a single file that explicitly states the manual corrections we're making (easier to track), and then the code becomes less bloated.
Or, in the least, we could have a separate R script that executes some of the cleaning.
— Reply to this email directly or view it on GitHub https://github.com/rBatt/trawl/issues/22#issuecomment-72493892.
@mpinsky I have not yet implemented these fixes, and could be related to the low temperature values in #30. I see in your code that some of those fixes involve changing 0's in gmex to NA's.
I haven't gotten around to these yet because there isn't always a simple 1-1 comparison between our code.
I'll need to add this to the master list of data verifications that need to happen (along with taxonomic ID's changing)
see #36
I have spent some time looking for outlier surface and bottom temperature values in each region that may be mistakes. There are some that I had not caught in the code for the 2013 Science paper. My latest cleaning code (from my range projection project) is below. It may be useful?