simpeg / aurora

software for processing natural source electromagnetic data
MIT License
14 stars 2 forks source link

Inconsistent coverage between metadata and data at IRIS/Earthscope #275

Closed kkappler closed 1 year ago

kkappler commented 1 year ago

Example: Station ORF08 in the EM network. I built the mth5 two times.

Once with data: fdsn_object.make_mth5_from_fdsn_client(request_df)

Once without data: inventory, data = fdsn_object.get_inventory_from_df(request_df, data=False)

Attached is a screengrab of the channel summaries. The pure metadata mth5 indicates: start time: 2006-09-04T17:43:59+00:00
end time: 2006-09-25T18:39:37+00:00 i.e. around three weeks of data,

The data mth5 indicates: start time: 2006-09-04T17:43:59+00:00 end time: 2006-09-04T19:04:17.875000+00:00 which is arund three hours of data.

Could it be that the data are not archived? image

laura-iris commented 1 year ago

Initial reaction is that yes, the metadata can span longer time periods than the actual data. It isn't a requirement that the metadata match the data exactly. Instead it needs to, at minimum, span the entirety of the actual data. There aren't necessarily best practices for this, so some networks/experiments have metadata that is significantly longer than the actual data.

That said, there's still the question of how to know if EarthScope returned an incomplete dataset. That's when comparing it to the output from the availability service (http://service.iris.edu/fdsnws/availability/1/) is helpful, since that service should faithfully reflect the actual holdings in the archive.

I'll take a look at this particular example, JIC.

laura-iris commented 1 year ago

Interesting, so looking at the availability service and at our actual data holdings, I see that we do have more data than just those few hours:

http://service.iris.edu/fdsnws/availability/1/query?includerestricted=true&nodata=404&network=EM&station=ORF08

#Network Station Location Channel Quality SampleRate Earliest Latest
EM ORF08 -- MFE M 8.0 2006-09-04T17:43:59.000000Z 2006-09-04T19:04:17.875000Z
EM ORF08 -- MFE M 8.0 2006-09-15T21:56:21.000000Z 2006-09-25T18:39:36.875000Z
EM ORF08 -- MFN M 8.0 2006-09-04T17:43:59.000000Z 2006-09-04T19:04:17.875000Z
EM ORF08 -- MFN M 8.0 2006-09-15T21:56:21.000000Z 2006-09-25T18:39:36.875000Z
EM ORF08 -- MFZ M 8.0 2006-09-04T17:43:59.000000Z 2006-09-04T19:04:17.875000Z
EM ORF08 -- MFZ M 8.0 2006-09-15T21:56:21.000000Z 2006-09-25T18:39:36.875000Z
EM ORF08 -- MQE M 8.0 2006-09-04T17:43:59.000000Z 2006-09-04T19:04:17.875000Z
EM ORF08 -- MQE M 8.0 2006-09-15T21:56:21.000000Z 2006-09-25T18:39:36.875000Z
EM ORF08 -- MQN M 8.0 2006-09-04T17:43:59.000000Z 2006-09-04T19:04:17.875000Z
EM ORF08 -- MQN M 8.0 2006-09-15T21:56:21.000000Z 2006-09-25T18:39:36.875000Z

There is a big gap from 2006-09-04T19:04:17.875000Z to 2006-09-15T21:56:21.000000Z, but then there is around 10 days of data.

So then the question is whether the request dataframe didn't include the timeframe for this latter segment of data, did the data not get returned fully, or is the channel summary simply not displaying that it exists?

kkappler commented 1 year ago

ORF08_request_df.csv

Attached is the request dataframe as a csv

kkappler commented 1 year ago

This is an interesting observation, but not an "aurora" issue. Closing for now.