niosh-mining / obsplus

A Pandas-Centric ObsPy Expansion Pack
GNU Lesser General Public License v3.0
38 stars 8 forks source link

NSLC cleanup #199

Closed sboltz closed 4 years ago

sboltz commented 4 years ago

This PR standardizes and cleans up the way that NSLC information is handled, particularly if it contains numeric values (particularly location codes). This is necessary because pandas can do strange things to dtypes that result in unexpected behavior.

It should have the following behavior:

This behavior should coerce things to follow the seed convention, but it will also allow for special cases where the NSLC are purely numeric.

codecov-commenter commented 4 years ago

Codecov Report

Merging #199 into master will increase coverage by 0.00%. The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #199   +/-   ##
=======================================
  Coverage   97.36%   97.36%           
=======================================
  Files          37       37           
  Lines        4175     4179    +4     
  Branches      664      665    +1     
=======================================
+ Hits         4065     4069    +4     
  Misses         52       52           
  Partials       58       58           
Flag Coverage Δ
#unittests 97.36% <100.00%> (+<0.01%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
obsplus/constants.py 100.00% <ø> (ø)
obsplus/utils/stations.py 99.47% <ø> (-0.01%) :arrow_down:
obsplus/events/validate.py 94.80% <100.00%> (ø)
obsplus/structures/dfextractor.py 87.35% <100.00%> (+0.60%) :arrow_up:
obsplus/utils/pd.py 97.59% <100.00%> (+0.02%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 2e3d1f2...357d638. Read the comment docs.

d-chambers commented 4 years ago

This looks well thought out. I especially like how you are using a custom dtype for NSLC codes to replace a multitude of ad-hacks.

It looks like it still needs a changelog entry. Apart from that, is there anything left to do?

d-chambers commented 4 years ago

I do have one question about station codes. These are typically 3 characters, probably for historical reasons. I think the seed spec just requires they are between 1 and 5 chars though. It might be nice if station 1 became station "001" rather than "01" but I suppose it isn't that important. What do you think?

sboltz commented 4 years ago

I do have one question about station codes. These are typically 3 characters, probably for historical reasons. I think the seed spec just requires they are between 1 and 5 chars though. It might be nice if station 1 became station "001" rather than "01" but I suppose it isn't that important. What do you think?

That is an interesting point. For simplicity, I would say that if the user wants to name their station "001", they should use the string convention that we have outlined. But that is largely out of laziness and because it's difficult to figure out what the appropriate thing to do there really is. For instance, for us we would almost certainly want station 1 to become station "1". The reason for having it be "01" is because all of the NSLC components are now using the same data type, and in particular I wanted to make sure it handles location codes properly.

d-chambers commented 4 years ago

Looks good