obspy / obspy

ObsPy: A Python Toolbox for seismology/seismological observatories.
https://www.obspy.org
Other
1.14k stars 527 forks source link

GCF mentioning inofficial superseded format documents #3003

Closed megies closed 1 year ago

megies commented 2 years ago

Güralp staff contacted us that official GCF file format specifications are available online and that the document mentioned in our io.gcf front page (io/gcf/__init__.py) should not be used as official source of information.

Just a heads up for any of the devs using GCF, in case they are curious and want to have a look..

CC @rannof

ThomasLecocq commented 2 years ago

I'm happy if Guralp wants to PR & provide support to make the code better

paitor commented 2 years ago

If not I have a long time plan to update the obspy reader to the latest revision of the format, think I've mentioned this before and apologies that I haven't found time for it yet. Sorry to say an obstacle for me here is to set up a proper dev environment that will not interfere with my production environment as I've never done this before.

A quick question here, how is it. Can C-extension code be added to obspy or do you require pure python code (think the mseed reader is C-extension, no)?

If C-extension can be added I already have a home grown c-code for reading/writing gcf data records as well as a python extension module. This has been extensively tested against our archive of 15 years of gcf data from (69 stations) albeit I have to admit that testing of read/write of data sampled at > 2000 Hz suffers from lack of access to test files (hence testing here have been done on files written and read by my code). Think updating obspy should then be less than a day (including adding writing capability) + the time it takes to set up a proper testing environment. Should I have to translate to pure python the time will grown by at least pi-squared. In any case I'd probably would need to ask some dumd question while setting up a dev environment.

paitor commented 2 years ago

Should perhaps add that a C-extension would significantly speed up reading of gcf files (about a factor of 10 from my experience using obspy vs. my c-extension python module + wrapping into a stream object)

megies commented 2 years ago

I don't think there's a reason why not to do the reading in C. But I might not be aware of potential problems, just cause not having much experience doing that in obspy. Sounds like a complete rewrite of the obspy module to me, which is fine too, we just should try to be as much backwards compatible as possible (I guess mostly in terms of how header fields are stored in the Trace objects).

Regarding dev environment.. are you on Linux or Mac? Should be trivial then, if Windows, Thomas can help.

paitor commented 2 years ago

Good, do you know whom of you I could consult if I run into trouble updating with c-code (i.e. a person that preciously added C-extension modules to the package)?

A few more or less related questions: Would it be of interest to add the gcf writer, or is only the gcf reader of interest?

I also have a python C-extension reader/writer for a data format known as bc (bit compressed) which is the data format used by the SIL earthquake detection system currently used in Iceland and Sweden (i.e. somewhat limited usage). Would it be of interest to also add this despite its limited usage?

Not really related, but as I have your attention. I've written a C-extension travel time calculator based on the Earth flattening approximation. This is quite simple in that it only computes direct or reflected P- and S-travel times (no phase conversions and no direct capability to detect if a phase passes through the mantle/core). However, it is quite fast (~ 2 orders of magnitude compared to taup as implemented in obspy) and comparison to obspy (java version) yields agreement to within a few hundreds of seconds or less (occasionally my code and taup differ by a few phases for turning depths close to a change in the velocity gradient. Albeit, quite simpler than taup I have found it useful in codes where I've needed to compute a lot of first p- and s- arrivals so wondering if this be of interest to the obspy users as an alternative to tuap where execution time may be a bottleneck?

megies commented 2 years ago

Good, do you know whom of you I could consult if I run into trouble updating with c-code (i.e. a person that preciously added C-extension modules to the package)?

Not sure.. can probably only recommend the obspy github hive mind ;-)

Would it be of interest to add the gcf writer, or is only the gcf reader of interest?

In general it's always good to be able to write too. But.. that means additional efforts for testing and might mean additional maintenance efforts in the future, so I guess you might wanna ask yourself, if you'd be willing to maintain that in the future. =)

I also have a python C-extension reader/writer for a data format known as bc (bit compressed) which is the data format used by the SIL earthquake detection system currently used in Iceland and Sweden (i.e. somewhat limited usage). Would it be of interest to also add this despite its limited usage?

Sure, why not. I guess if you give GCF a shot you will get an idea how easy or complicated it would be to add that too..

Not really related, but as I have your attention. I've written a C-extension travel time calculator based on the Earth flattening approximation. This is quite simple in that it only computes direct or reflected P- and S-travel times (no phase conversions and no direct capability to detect if a phase passes through the mantle/core). However, it is quite fast (~ 2 orders of magnitude compared to taup as implemented in obspy) and comparison to obspy (java version) yields agreement to within a few hundreds of seconds or less (occasionally my code and taup differ by a few phases for turning depths close to a change in the velocity gradient. Albeit, quite simpler than taup I have found it useful in codes where I've needed to compute a lot of first p- and s- arrivals so wondering if this be of interest to the obspy users as an alternative to tuap where execution time may be a bottleneck?

Again, I don't see a reason not to add it. It certainly sounds like it could be useful. Although I am not the right person to ask about workflows were using the taup module becomes a bottle neck, to be honest

d-chambers commented 2 years ago

RE the travel time calculator: I am inclined towards no unless you (@paitor) have a very strong, multi-year commitment to maintaining the new modules. There are a whole bunch of taup issues that none of the active maintainers knows how, or is determined enough, to debug. Moreover, the maintenance burden is much higher for C code, since many of us don't program much in C.

Another option is to create a separate, obspy-related, package. There are some good templates out there that make it less painful than one might think.

paitor commented 2 years ago

@d-chambers Point taken. I usually tend to fix bugs as soon as they are discovered but then again most of the pieces of code I've put together have a very limited user base (basically me) so although I have high intentions nothing can be guaranteed.

Think I'll have a go with the gcf reader as a starter though, the c-base should be well enough tested by now given that we use it daily in various production codes. So the maintenance here should hopefully be limited to adapting to changes in the python c-api (at present I use macros to bridge the changes between python 2/3 as we still need the code to run on some of our ancient servers where python 2.5 still roams...). If this proves simple enough I'll have a go on a reader for the bc format and then possibly writers for gcf and bc.

Regarding the travel time calculator, one of the intentions of adding it were to hopefully expand its user base in order to weed out bugs I have not been able to trigger my-self . But probably that's not entirely appropriate so perhaps better to make it available as is via github or pypi in order to (a) find out if its useful to others and (b) wash out bugs.

paitor commented 2 years ago

Hi

Relating to updating the gcf reader for now.

So I've set up a development environment, forked the obspy master, and created a branch gcf-c-source, thus are set to start coding, however a few questions first to keep in line with existing coding procedures.

  1. After a quick glance at the current modules that make use of c-code at the core it seems to me that all of these produce shared libraries and then access these using the ctypes module. Do you see a potential problem if one instead adds a python C-extension module (would require python-dev to be installed on the system which does not seem to a pre-requisite at present)?
  2. I only have access to linux machines, how do I go about testing the update on windows/mac(/other)?
  3. It is a bit unclear to me if I should submit at PR before starting to update the gcf reader or if its enough to do this once I'm done with the update and ready to push it.

Apologies if the questions are simple and the terminology used not the proper, I'm a newbie when it comes to git as well as contributing to other projects. Just correct and I'll improve in the next round.

megies commented 2 years ago
  • After a quick glance at the current modules that make use of c-code at the core it seems to me that all of these produce shared libraries and then access these using the ctypes module. Do you see a potential problem if one instead adds a python C-extension module (would require python-dev to be installed on the system which does not seem to a pre-requisite at present)?

Somebody else would have to comment on that, but my first impression seems like this could be additional workload in maintenance if this is adding new complexity to our project.

  • I only have access to linux machines, how do I go about testing the update on windows/mac(/other)?

Any PR gets built on commit pushes by github CI on all platforms and tests get run and reported after building the package and can be viewed on the commit / PR status and on our test reports site which can be filtered by node name and also PR number.

  • It is a bit unclear to me if I should submit at PR before starting to update the gcf reader or if its enough to do this once I'm done with the update and ready to push it.

The sooner you open a PR, see sooner people can enter the discussion. In general I would say the earlier you open a PR, the better, because you (can) get early feedback, which can especially save time if people notice that something is going in the wrong direction.

paitor commented 2 years ago

Seems fair, then I'll use ctypes. Could also be interesting to me to see if there's some differences in the runtime using ctypes vs. a C-extension module (and I guess using ctypes should simplify handling python 2/3 compatibility).

I'll try to formulate and open a PR during today then, thanks for the input sofar