Add SeisData into ASDF - Githubissues

kura-okubo commented 4 years ago

Hello Josh,

Thank you for implementing write_hdf5() for output in ASDF format. I wonder if we could have an option to add SeisData into an existing ASDF file, like using fid = h5open(filename, "cw").

Cheers, Kurama

jpjones76 commented 4 years ago

Your timing is perfect. I am actually working on coding this in another window. :)

jpjones76 commented 4 years ago

I'm testing my changes now. Here's what I'm adding:

write_asdf with no special keywords will write to existing files by creating new traces in Waveforms/net.sta
write_asdf with keyword ovr=true will overwrite (parts of) existing traces within Waveforms/net.sta,
write_asdf with keyword add=true will add new traces as needed to Waveforms/net.sta.
- I'm adding another keyword to let users control the size of each new trace, which is going to work with DateTime types, so if you want to initialize blank one month traces and then overwrite part of them with your trace data, you'll use something like trace_len=Month(1).

These changes will go live either tonight or tomorrow.

I'd like to Skype with your group to explain these changes more and cover your use cases. If this isn't what your group needs, we can iterate on the write behavior until it matches what you intend to do. (The way that you're using ASDF seems very archetypal for working with large files, so I want to ensure that your work flow is intuitive and practical)

kura-okubo commented 4 years ago

Hi Josh,

I think the update above is more than enough for my use, and general use as well. I will use that function for not only download data but also intermediate files during post processing.

Cheers, Kurama

jpjones76 commented 4 years ago

The improvements to write_hdf5 are now live on master. Please check the documentation for details and usage. With the way that your group writes data, you probably want to use KW add=true most of the time.

kura-okubo commented 4 years ago

I tried the new keywords of write_hdf5 and it works! Thank you for the update.

using SeisIO, Plots

s = "2004-01-15T12:00:00"
t = "2004-01-15T14:00:00"
hdfname = "test.h5"

S1 =  get_data("FDSN", "BP.LCCB..BP1", s=s, t=t, v=1, src="NCEDC", unscale=true, rr=true)
S2 =  get_data("FDSN", "BP.RMNB..BP1", s=s, t=t, v=1, src="NCEDC", unscale=true, rr=true)

write_hdf5(hdfname, S1, add=true)
write_hdf5(hdfname, S2, add=true)

S3= read_hdf5(hdfname, "2004-01-15T12:30:00", "2004-01-15T12:40:00", id="BP.LCCB..BP1")

tvec = S3[1].t[1,2]*1e-6 .+ collect(0:S3[1].t[2,1]-1) ./ S3[1].fs
tvec = map(x -> x[1:end-4], timestamp.(tvec))

plot(tvec, S3[1].x, xrotation=45)

tclements commented 4 years ago

Nice work on the ASDF read/write, Josh! I'm trying to read an ASDF file written with SeisIO with pyasdf

S = get_data("FDSN", "TA.M22K..BHZ",s="2019-01-01",t=1)
SeisData with 1 channels (1 shown)
    ID: TA.M22K..BHZ                       
  NAME: Willow, AK, USA                    
   LOC: 61.7531 N, -150.12 E, 57.0 m       
    FS: 40.0                               
  GAIN: 5.03883e8                          
  RESP: a0 8.32666e17, f0 0.2, 6z, 11p     
 UNITS: m/s                                
   SRC: http://service.iris.edu/fdsnws/da… 
  MISC: 4 entries                          
 NOTES: 0 entries                          
     T: 2019-01-01T00:00:00.000 (0 gaps)   
     X: +8.540e+02                         
        +6.230e+02                         
            ...                            
        +5.330e+02                         
        (nx = 41)                          
     C: 0 open, 0 total
out = "seisiotest.h5"
write_hdf5(out,S)

Here is how I would go about creating the same file in python

import obspy
import pyasdf
from obspy.clients.fdsn import Client
client = Client("IRIS")
st = client.get_waveforms("TA","M22K","","BHZ",obspy.UTCDateTime(2019,1,1),obspy.UTCDateTime(2019,1,1,0,0,1))
ds = pyasdf.ASDFDataSet("pyasdftest.h5", compression=None)
ds.add_waveforms(st,tag=st[0].stats.channel.lower())
ds.waveforms.TA_M22K.bhz                                               
1 Trace(s) in Stream:
TA.M22K..BHZ | 2019-01-01T00:00:00.000000Z - 2019-01-01T00:00:01.000000Z | 40.0 Hz, 41 samples
del ds

When reading the file made with SeisIO in python, there is a trailing underscore on the channel name

import pyasdf
ds = pyasdf.ASDFDataSet("seisiotest.h5")
ds.waveforms.TA_M22K
Contents of the data set for station TA.M22K:
    - Has a StationXML file
    - 1 Waveform Tag(s):
        bhz_
ds.waveforms.TA_M22K.bhz_
1 Trace(s) in Stream:
TA.M22K..BHZ | 2019-01-01T00:00:00.000000Z - 2019-01-01T00:00:01.000000Z | 40.0 Hz, 41 samples

Is the trailing underscore necessary for reading/writing? Removing/remembering the underscore forces some extra bookkeeping on the user

jpjones76 commented 4 years ago

Oh! No, it's not necessary at all. I defined the channel name strings to follow @mdenolle's naming syntax in your group's test files. I'll drop the underscore from the tag.

I think I'll take that one step further and add a "tag=" KW; the tag is a freeform string in the file specification (e.g. Lion Krischer uses the tag "raw", rather than the three-letter component name, in the test volume that he sent me).

I can patch this in tonight but it will be a couple of hours before I can start.

tclements commented 4 years ago

Awesome, no worries on the timing - this isn't urgent. I like the idea of a tag KW. I'm writing a write_hdf5 for CorrData based on your routine for SeisData

jpjones76 commented 4 years ago

The tag change should now be live, along with removal of the trailing underscore.

jpjones76 / SeisIO.jl

Add SeisData into ASDF #25