Closed flixha closed 3 years ago
Hi @flixha,
I am glad you are finding ObsPlus useful.
I ran your example code and was able to reproduce the issue. There is actually something wrong going on with obspy.ResourceIdentifier
's machinery. If you try:
cat[0].resource_id.get_referred_object()
it returns the Event
as expected. I am not sure exactly what is causing the issue.
Is there a reason you need to use the string in the dataframe rather than the ResourceIdentifier
object in the catalog?
Hi, thanks for checking the example!
I started using Obsplus partly because it makes working with larger catalogs a lot quicker and more user-friendly, e.g., when selecting a subset from a catalog, sorting the catalog by some properties of the events, or when trying to find one specific event ( compared to Obspy catalog functions).
So when I for example find a few events in the obsplus-dataframe which I want to remove from the obspy-catalog, then I thought the simplest way to refer to the events is by their ResourceIdentifier
; which in this case I construct from the string because that's available in the default dataframe. I guess there's two workarounds:
tmp_cat_df = cat_df.loc[cat_df.time > UTCDateTime(2013,9,1,0,0,0)]
event = cat[tmp_cat_df.index[0]]
cat2 = Catalog([ev for ev in cat if not ev == event])
As a more robust alternative I was thinking whether it could be possible to store the ResourceIdentifier
directly in the dataframe, but I guess that's a bit too experimental yet (TypeError: dtype '<class 'obspy.core.event.resourceid.ResourceIdentifier'>' not understood
):
from collections import OrderedDict
from obspy.core.event.resourceid import ResourceIdentifier
EVENT_DTYPES = OrderedDict(
time="datetime64[ns]",
event_id=str,
event_rid=ResourceIdentifier, )
EVENT_COLUMNS = list(EVENT_DTYPES)
import obspy.core.event as ev
from obsplus.structures.dfextractor import (
DataFrameExtractor,
standard_column_transforms, )
events_to_df = DataFrameExtractor(
ev.Event, required_columns=EVENT_COLUMNS,
column_funcs=standard_column_transforms, dtypes=EVENT_DTYPES,)
cat_df = events_to_df(cat)
Ah, I see. I would really not like to include the ResourceIdentifier
in the dataframe representations for two reasons. First, right now all the columns are basic, easily serializable types. Adding custom class instances can greatly complicate the serialization/deserialization process. Second, after fighting with ResourceIdentifier
for several years, I am of the opinion it just doesn't (and cant) work very well so I personally try to avoid using it.
However, I think there are at least two ways of doing what you want in a more efficient way.
The first, for simpler queries you can just use the catalog.get_events
method. This should behave almost identically to the fdsn client's get_events
method.
import obspy
import obsplus
cat = obspy.read_events()
cat_filtered = cat.get_events(min_magnitude=3)
For more complicated queries/manipulations, why not just store the events in a column of the dataframe?
import obspy
import obsplus
# make sure the order of the catalog doesn't change until this block finishes
cat = obspy.read_events()
df = obsplus.events_to_df(cat)
df['events'] = cat.events
# now you can sort, query, filter, do whatever you want to the dataframe!
df = (
df.sort_values('time')
.query('depth > 2_000')
.query('magnitude_type == "ML"')
)
# reassemble catalog from the processed dataframe
cat_out = obspy.Catalog(events=list(df['events']))
Would that work for you?
Hey @flixha,
Did the above suggestions work for you? If so I will close the issue.
@d-chambers Sorry for not replying earlier - I had started to test a few solutions that you suggested and they indeed work very well, so thanks a lot for pointing these out! I like the solution of adding the event itself to the dataframe, hadn't thought about it being that easy actually ;-) .
I understand that from your tests and usage you have a general "mistrust" for the ResourceIdentifier-things in obspy, so that's also very valuable to know that it may be best to avoid using these when not required - so I'll keep that advice !
Description Thanks a lot for developing obsplus, I'm kind of just getting started with the very useful functionality that it offers. @d-chambers helped me with an issue regarding obspy's ´ResourceIdentifier´ previously (https://github.com/obspy/obspy/issues/2842), for which obsplus contained the function ´attach_all_resource_ids´ to solve the issue. That worked, but I'm running into some limitation now that I'm trying to understand. After reading in events in parallel, I end up with broken resource-ID links that I fix with the function. While I can usually create a working link from the resource-ID string, it doesn't work in the case shown below. Is this just a limitation of
ResourceIdentifier
that I have to accept, or is there something thatattach_all_resource_ids
could still help with here?To Reproduce
Why would I want to use the above code After I filter the
cat_df
with pandas functionality, i thought the most robust way to retrieve the event that is represented by a line in thecat_df
would be to use thecat_df.event_id
.Expected behavior If I set
n_jobs=1
in above example, I will be able to retrieve theevent
. Withn_jobs=2
,event
becomesNone
. As i thought that I had fixed the resource-ID linking, I would expect to retrieve the event also in the case with the originally broken resource-ID links. I am able to retrieve the event in both cases with the following line:Versions (please complete the following information):