usgs / groundmotion-processing

Parsing and processing ground motion data
Other
54 stars 42 forks source link

cosmos writer error #913

Closed hadigh closed 2 years ago

hadigh commented 2 years ago

cosmos_write_example.zip Hi gmprocess team,

I am trying to write a cosmos file for processed streams in a .h5 file. I am using the 'CosmosWrite', and am getting the AttributeError:

**File ~/Tools/groundmotion-processing/gmprocess/io/cosmos/cosmos_writer.py:155, in Table4.get_matching_network(self, eventid) 153 eventid = eventid.lower() 154 for idx, row in self._dataframe.iterrows(): --> 155 network = row["IRIS Code"].lower() 156 if eventid.startswith(network): 157 return network

AttributeError: 'float' object has no attribute 'lower'**

Any help would be appreciated

emthompson-usgs commented 2 years ago

This appears to be a problem with the IRIS code in COSMOS Table 4 in this row:

60 | Central and Eastern US Network | UCSD | NA

The NA is getting parsed as NaN rather than a string. But also I don't think that NA is the correct code for "Central and Eastern US Network". It probably should be N4. But that doesn't make sense with UCSD as the abbreviation (N4 should be ASL/USGS). Since we didn't make this table, I'm not sure we should modify it. I'll try to make it so that this doesn't raise an error, but we need to get clarification from COSMOS on what was the intended values here.

NNovoa-CDWR commented 2 years ago

Hi, I am looking into the discrepancy and will let you know when we resolve it.

Nicholas Novoa COSMOS President

emthompson-usgs commented 2 years ago

Thanks @NNovoa-CDWR!

hadigh commented 2 years ago

@emthompson-usgs

Thanks for looking into this, I can now successfully generate V2 files for my example, but getting the below error when setting the label to label='unprocessed'

File ~/Tools/groundmotion-processing/gmprocess/io/cosmos/cosmos_writer.py:734, in CosmosWriter.write(self) 732 t2 = time.time() 733 t_write.append(t2 - t1) --> 734 text_av = sum(t_text) / ntraces 735 int_av = sum(t_int) / ntraces 736 float_av = sum(t_float) / ntraces

ZeroDivisionError: division by zero

emthompson-usgs commented 2 years ago

Thanks for the report. I'll look into this.

emthompson-usgs commented 2 years ago

The underlying reason for this is that when @mhearne-usgs wrote this first version of the writer, he only included support for data that has been converted to physical units of acceleration (see here). This is why none of the unprocessed records get included and this leads to a division by zero error. We can potentially add support for raw data with units of counts, or fix this so that it exits gracefully with a better error message but I'll need to consult @mhearne-usgs about this.

emthompson-usgs commented 2 years ago

@hadigh I just sent in a PR that will avoid the error and add a logging statement to indicate that no traces were processed in this case.

The question I have for you is: How important is it for you to be able to write raw data with the COSMOS file? My (possibly incorrect) impression is that this format is mostly intended for people that are not able to directly make use of the ASDF format, which is exhaustive in terms of data/metadata. I expect that unprocessed/raw data may not be desired by these users. In fact, they won't have access to the instrument response without the ASDF file and so they wouldn't be able to convert to physical units for their analysis.

Also, one suggestion for your script would be to add the logger so that logging messages get printed like this:

from gmprocess.utils.logging import setup_logger
setup_logger()
hadigh commented 2 years ago

@emthompson-usgs thanks for clarifying this and also the hint for logging.

what you mentioned regarding the instrument response and user interest makes perfect sense, what I am after is somehow between COSMOS V1 and V2: time-series in the physical unit only by applying basic baseline adjustment and instrument response correction and no additional filtering! perhaps I can achieve this by storing two versions of the processed time-series under different labels in the h5 file!

emthompson-usgs commented 2 years ago

Yeah, that seems reasonable. In fact the only sensible use of alternative labels that I can think of would be something like this, where you've processed the records using two different sets of processing parameters. We haven't had any call for that ourselves and so I don't think that there's a simple way to automate this (e.g., use a different config file for different labels for a given project). For now, I'll close this issue. If you have ideas about changes to the code that would facilitate this, please create a new issue.

Regarding the logger: that is what gets done when the gmrecords command starts. It would be nice if we had a way of having it setup automatically for scripts like this, but I don't know how to do that.