mspass-team / mspass

Massive Parallel Analysis System for Seismologists
https://mspass.org
BSD 3-Clause "New" or "Revised" License
30 stars 12 forks source link

schema collision reading miniseed files #290

Open pavlis opened 2 years ago

pavlis commented 2 years ago

I thought we fixed this a while back, but we have a mismatch in the schema of wf_miniseed and wf_TimeSeries. We should expect a very common initial step in most workflows is reading raw data as miniseed and then saving some partially processed data to the native wf_TimeSeries collection. The problem is the infamous seed net, sta, chan, and loc. They are defined in wf_miniseed with constraint: normal but in Metadata->TimeSeries the same attributes are set as read_only. The consequence is a low-level job like the one I have running at this moment (and will for the next several days) that detrends and windows down 3.5 million TimeSeries objects and saves them logs 4 error log entries for each saved TimeSeries. That means if this job finishes it will be writing around 14 M elog documents. That is not a good thing and will require some housecleaning if it finishes that way.

I think the solution is simply to change the schema constraint in Metadata->TimeSeries (there may be another in Seismogram come to think of it) from "read_only" to "normal".

pavlis commented 2 years ago

Correction on the numbers of elog entries. I had forgotten that when we save data all elog messages are encapsulated in a single document with subdocuments holding the contents of each message posted. This particular example has 4 subdocuments for each wf_TimeSeries being saved. So there will be one elog entry per saved TimeSeries but at least 4 messages saved as subdocuments in each one.