trichter / yam

Yet another monitoring tool using correlations of ambient noise
MIT License
22 stars 11 forks source link

yam correlate 1 #17

Closed shubham-geolearner closed 2 weeks ago

shubham-geolearner commented 3 months ago

when I run this 'yam correlate 1'. It consumes all the memory and in the end system crashes. I am doing this for 74 mseed files. My system has 32 GB memory with 64 GB swap memory

trichter commented 3 months ago

Hi! What is your setup and configuration? Do you have 74 stations?

shubham-geolearner commented 3 months ago

Hi..I'm following this link https://nbviewer.org/github/trichter/notebooks/blob/master/yam_velocity_variations_patcx/processing_patcx.ipynb to calculate seismic velocity variation at a station. I have 74 days data (mseed files) from single station (100Hz).

trichter commented 3 months ago

Sorry, I am still not clear. Do you follow the notebook 1 to 1 or do you use your own data?

I checked the notebook inside a fresh conda environment and everything appears to work.

It sounds like you are loading data from all days at the same time?

Any hints from the logs? Which OS do you use?

shubham-geolearner commented 3 months ago

I am using my own station data. Yes with the example data things are working fine at my end but when I am using own data it throws the problem. I am using the mint OS (linux).

trichter commented 3 months ago

It is difficult to get to the root of the problem without more information. Can you post the log output and configuration? What is the sampling rate of your data? How large are your data files? Which versions of ObsPy and yam are you using? Please also post output of obspy-print your_data_file.

alemkhodadadi commented 3 weeks ago

I have the same issue with my own data. my conf.json:

{
"loglevel": 3,
"logfile": "yam_IPOC.log",
"io": {
        "inventory": "stations/*.xml",
        "data": "day_files/{network}.{station}.{location}.{channel}*.mseed",
        "data_format": "MSEED",
        "corr": "corr.h5",
        "stack": "stack.h5",
        "stretch": "stretch.h5",
        "plot": "plots"
        },
"correlate": {
        "1": {  "startdate": "2018-08-02",
                "enddate": "2018-08-11",
                "length": 60,
                "overlap": 30,
                "discard": 0.9,
                "filter": [0.8, 8],
                "max_lag": 30,
                "normalization": ["1bit", "spectral_whitening"],
                "keep_correlations": false,
                "stack": "1d"
                }
        }
}

yam info gives me the following:

Stations:
    AN.EE AN.SS AN.TT
    3 stations, 9 channels
Raw data (expression for day files):
    day_files/{network}.{station}.{location}.{channel}*.mseed
    90 files found
Config ids:
    c Corr: 1
    s Stack: None
    t Stretch: None
Correlations (channel combinations, correlations calculated):
    None
Stacks:
    None
Stretching matrices:
    None

but when I run yam correlate 1 --parallel-inner-loop -n 3, It stuck at 0 percent and then the computer will crash. I'm using Linux.

obspy-print shows one stream with len=1, the trace.stats is as below:

network: AN
         station: EE
        location: 
         channel: DPE
       starttime: 2018-08-03T00:47:22.987500Z
         endtime: 2018-08-03T00:57:22.987500Z
   sampling_rate: 400.0
           delta: 0.0025
            npts: 240001
           calib: 1.0
         _format: MSEED
           mseed: AttribDict({'dataquality': 'D', 'number_of_records': 476, 'encoding': 'FLOAT64', 'byteorder': '<', 'record_length': 4096, 'filesize': 1949696}

I have data for 3 stations each having 3 components and for 10 days (3 times 3 times 10 = 90 files in total) in separate mseed files in the directory _dayfiles. the starttime and endtime for each day might be different. the commands yam plot data AN.EE..DPE 2018-08-02 and yam plot prepdata AN.EE..DPE 2018-08-02 1 plot data and prepdata without any problem.

the last lines of yam_IPOC.log is:

2024-11-03 23:19:49,807 main      56368 INFO    Yam version 0.7.2
2024-11-03 23:19:49,807 main      56368 INFO    do notuse pyfftw library
2024-11-03 23:19:49,958 main      56368 INFO    read inventory with 3 stations
2024-11-03 23:19:49,958 commands  56368 INFO    start preprocessing and correlation
2024-11-03 23:19:49,959 commands  56368 INFO    do work sequentially
2024-11-03 23:19:49,971 correlate 56368 WARNING empty stream for day 2018-08-02
2024-11-03 23:19:49,972 correlate 56368 WARNING empty stream for day 2018-08-03
2024-11-03 23:19:49,974 correlate 56368 WARNING empty stream for day 2018-08-04
2024-11-03 23:19:49,975 correlate 56368 WARNING empty stream for day 2018-08-05
2024-11-03 23:19:49,977 correlate 56368 WARNING empty stream for day 2018-08-06
2024-11-03 23:19:49,979 correlate 56368 WARNING empty stream for day 2018-08-07
2024-11-03 23:19:49,980 correlate 56368 WARNING empty stream for day 2018-08-08
2024-11-03 23:19:49,982 correlate 56368 WARNING empty stream for day 2018-08-09
2024-11-03 23:19:49,983 correlate 56368 WARNING empty stream for day 2018-08-10
2024-11-03 23:19:49,985 correlate 56368 WARNING empty stream for day 2018-08-11
2024-11-03 23:19:49,985 commands  56368 INFO    finished preprocessing and correlation
2024-11-03 23:19:49,985 main      56368 DEBUG   used time: 0.2s
trichter commented 3 weeks ago

Is the log output from the crash? From the log it appears that yam finishes normally. These warnings WARNING empty stream for day 2018-08-02 might hint to a problem.

You should double-check the expression for the day files in your configuration. At the moment it does not specify the date. Therefore for each day all data is loaded with your glob expression. Please compare to the configuration of the tutorial:

        # Expression for data file names (each 1 day). It will be evaluated by
        # string.format(t=day_as_utcdatetime, **station_meta).
        # The default value corresponds to the default naming of ObsPys FDSN Massdownloader.
        # Scheme for SDS archive
        # "data": "example_sds_archive/{t.year}/{network}/{station}/{channel}.D/{network}.{station}.{location}.{channel}.D.{t.year}.{t.julday:03d}",
        "data": "example_data/{network}.{station}.{location}.{channel}__{t.year}{t.month:02d}{t.day:02d}*.mseed",
        "data_format": "MSEED",

Does that fix your problem?

alemkhodadadi commented 2 weeks ago

yes, the problem was with how I was referencing the data in day_files. thanks.

trichter commented 2 weeks ago

Great!

Also, no response from the original issue creator. Closing this for now.