smousavi05 / STEAD

STanford EArthquake Dataset (STEAD):A Global Data Set of Seismic Signals for AI
Creative Commons Attribution 4.0 International
278 stars 67 forks source link

pick data #18

Closed xs20211015 closed 1 year ago

xs20211015 commented 1 year ago

Dear author, how to select the third number (that is, the z component) greater than 50 in the attribute snr_db? Can this be done?Looking forward to your reply, thank you!

smousavi05 commented 1 year ago

@xs20211015 Yes, try this:

def string_convertor(dd):

dd2 = dd.split()
SNR = []
for i, d in enumerate(dd2):
    if d != '[' and d != ']':

        dL = d.split('[')
        dR = d.split(']')

        if len(dL) == 2:
            dig = dL[1]
        elif len(dR) == 2:
            dig = dR[0]
        elif len(dR) == 1 and len(dR) == 1:
            dig = d
        try:
            dig = float(dig)
        except Exception:
            dig = None

        SNR.append(dig)
return(SNR)

reading the csv file into a dataframe:

df = pd.read_csv(csv_file) df = df[df.trace_category == 'earthquake_local'] df.snr_db = df.snr_db.apply(lambda x: string_convertor(x)[-1]) df = df[df.snr_db > 50] ev_list = df['trace_name'].to_list()

retrieving selected waveforms from the hdf5 file:

dtfl = h5py.File(file_name, 'r') for c, evi in enumerate(ev_list): dataset = dtfl.get('data/'+str(evi))

waveforms, 3 channels: first row: E channel, second row: N channel, third row: Z channel

data = np.array(dataset)
xs20211015 commented 1 year ago

@smousavi05 Thanks, this helped me a lot!