nfsi-canada / OrientPy

Seismic station orientation tools
https://nfsi-canada.github.io/OrientPy/
44 stars 16 forks source link

MSEED #17

Open dylanmikesell opened 2 years ago

dylanmikesell commented 2 years ago

Hi @paudetseis, is there any reason on your end to add MSEED read functionality. It seems to me like it should be able to handle both. I have mseed day volumes stored locally. I looked through io.py last night. Seems like a bit of work to fix things to get mseed working. Has anyone else made this request? I can have a go at it, but if no one else has been asking about this functionally, maybe I will just convert my data to SAC.

paudetseis commented 2 years ago

Hi @dylanmikesell you are right - ideally the code would handle both. It should in fact be straightforward to modify the code to allow loading MSEED files. Most lines referring to SAC (e.g., lines 73, 75, 78, 144, 152, etc.) use it as a string to search for the the corresponding SAC extension. You could add an argument to the functions download_data(), parse_localdata_for_comp() and list_local_data_stn(), for instance ext='SAC', which would load SAC files by default, but accept other file extensions (like MSEED - note this will need to be a string e.g., 'MSEED'). Then all you need is to modify those lines above to use the new variable ext in the string match. E.g. for lines 142-147:

        lclfiles = list(filter(
            stdata,
            '*/{0:4s}.{1:3s}.{2:s}.{3:s}.*.{4:2s}{5:1s}.{s}'.format(
                styr, stjd, sta.network.upper(
                ), sta.station.upper(), sta.channel.upper()[0:2],
                comp.upper(), ext)))

Then on lines 324 and 327, you change st1 = read(sacf1, format='SAC') to st1 = read(sacf1, format=ext).

The only place where this gets tricky is on lines 333 and 334, where there is a check for no data in stats.sac['user9']. I'm not sure how to deal with this with mseed files that don't have the stats.sac info.

I hope this helps.

dylanmikesell commented 2 years ago

Okay. Thanks for the tips. I will have a look and work on this.

One more question: can we adapt the filename structure requirements? I just converted some mseed to sac using IRIS' mseed2sac function. An example filename looks like YG.SE3.00.HH1.D.2021.200.000001.SAC. io.py cannot find these data even now that they are in SAC format. Are these format 1 and format 2 coming from your own channel naming? Would it be useful to provide different formats, e.g., like msnoise?

data_structure['SDS'] = "YEAR/NET/STA/CHAN.TYPE/NET.STA.LOC.CHAN.TYPE.YEAR.DAY"
data_structure['BUD'] = "NET/STA/STA.NET.LOC.CHAN.YEAR.DAY"
data_structure['IDDS'] = "YEAR/NET/STA/CHAN.TYPE/DAY/NET.STA.LOC.CHAN.TYPE.YEAR.DAY.HOUR"
data_structure['PDF'] = "YEAR/STA/CHAN.TYPE/NET.STA.LOC.CHAN.TYPE.YEAR.DAY"

My miniseed do not follow your formats either, e.g., SE3.YG.00.HHZ.2021.183. So I need to figure something out.

dylanmikesell commented 2 years ago

@paudetseis, can you tell me what stdata is meant to be in the following lines?

lclfiles = list(filter( stdata, '/{0:4s}.{1:3s}.{2:s}.{3:s}..{4:2s}{5:1s}.{s}'.format( styr, stjd, sta.network.upper( ), sta.station.upper(), sta.channel.upper()[0:2], comp.upper(), ext)))

If you look at what stdata actually is, it is the path given by the --local-data option PATH. Thus, I can never get lclfiles to populate. If you look in the doc of the functions in io.py it says the following.

  stdata : List
      Station list

Thus, I think there might be a bug in the local-data code. Does it work for your own tests?

paudetseis commented 2 years ago

@dylanmikesell there might well be a bug in the local-data code. To be honest with you I have never tested this part - it's inherited from other codes written by a collaborator. But yes, the stdata variable holds the path to your local data folder.

I also like your suggestion to have different naming conventions for the data files. Ultimately I think the scripts need a major overhaul but I don't have the time to spend on this unfortunately. Feel free to have a stab at it and make them work for your application. If your changes end up making it more broadly accessible, can you please generate a PR so I can merge your changes with the main branch`?

Thanks

dylanmikesell commented 1 year ago

Hi @paudetseis. I finally got back to this. I have the local SAC data partially figured out and will submit a PR when done. First, we were misssing a call to list_local_data_stn() to actually build the file list from the local data path. The function where there, it was just never called in the appropriate place. That is fixed now.

My SAC data do not have a user9 header word...and as you stated before, MSEED will not have this either. I am curious, where does user9 come from as a check for no data in your code? If that header set in your own local data? To me it seems like we should just have a check on the st.data length. Here is your code.

# Check for NoData and convert to NaN
                stnd = st[0].stats.sac['user9']
                eddt = False
                if (not stnd == 0.0) and (not stnd == -12345.0):
                    st[0].data[st[0].data == stnd] = ndval
                    eddt = True

If you can provide me a little more insight into user9 I can get this fixed. I noticed a few other issues have popped up about this local directory issue, so I will try to wrap it up quickly.