Closed pareantoine closed 7 years ago
Are you able to share your SEG Y file?
On my computer (MacBook Pro), the initial read for a 700 MB SEG Y file containing >120000 traces takes about 7 seconds. Subsequent reads, once the index has been cached take about 1 second.
Unfortunately I cannot share the SEG-Y file.
I've tried on a MacBook Pro and a PC and both are struggling. 300 seconds to load the file but only 9 seconds to go through all the traces. I am not loading surface seismic data but borehole seismic data (VSP) which has slightly different headers and no inline-crossline or proper shotpoints.
I wonder if it is slow because segpy reads it as a 3D seismic volumes as the dimensionality of the segy_reader is 3.
How can I force create_reader to read my SEG-Y file as a SegYReader and not a SegYReader3D object?
I've submitted an issue on your own repo with some suggestions which should help a lot: https://github.com/pareantoine/VSP-Processing/issues/1
"How can I force create_reader to read my SEG-Y file as a SegYReader
and not a SegYReader3D
object?"
This won't be affecting the performance, it just means you get some extra functionality on the resulting SegYReader
object which is useless but harmless in your case.
If you can submit another issue we can deal with this separately from your performance concerns.
Hi Rob, thanks a lot for your answers.
I might do some test with the tool kit to figure out which step of the catalog_traces function is killing it for my SEG-Y as it's clearly the one taking a very long time.
I'll come back to you if I find anything.
That will probably be fruitful. It's likely that in your case segpy is doing lots of unnecessary work when it is trying to index the data. A plain SegYReader
only needs a trace_offset_catalog
and a trace_length_catalog
so a cut-down version of the existing, very generic, code path could make a big difference. Segpy has two APIs, the higher-level "reader/writer" API and a lower-level API defined mostly in segpy.toolkit
. You may be able to figure out which low-level calls are actually needed.
Rob, the solution was fairly simple. While going through the code on my laptop and on github I realised minor differences. I removed segpy from my environment and reinstalled it directly from the github website. It now takes 33 seconds.
I believe that running pip install segpy isn't actually installing the latest version which is available on github.
Great. I'll push a new version to PyPI soon to be installed with pip.
Did you make the other changes I suggested too?
No I haven't had much time to look at the rest of my code but I'll do it soon and I'll test it. I'll also add an adapted trace header format that fits better with VSP data.
What is the Data Sample Format for your data? If it's IBM Float we have a C++ plugin which makes reading that format about 10x faster.
I've got both IEEE and IBM, I'll test the difference in loading time.
Well strangely enough, I've got two dataset with the same number of traces, sample number and size (slightly different headers and data). 5 seconds to load the IEEE float32, 35 seconds for the IBM float32...
edit: seems to change, just restarted the kernel and loaded the IBM in 3 seconds.
Hmm. You should expect reading of IBM trace samples to be substantially slower than reading IEEE samples. The times for the headers should be the same.
Hi there,
I am fairly new to Python and even more to Segpy. I am using Segpy on VSP dataset and not surface seismic and I find the initial loading fairly slow. (I can't see any reason that VSP and not surface seismic would make much of a difference). 230 seconds roughly for a 600Mb file compared to 30 seconds with Obspy.
Once loaded I find reading one specific header but for all traces as slow. If I wanted for example to plot all source X and Y coordinates of all traces in the SEGY, I would pull the coordinates with the following code for the X coordinates: np.array([segy_reader.trace_header(trace_index).source_x for trace_index in segy_reader.trace_indexes()])
Again, this would take not far from 230 seconds when I would have imagined that the headers should be in memory and we wouldn't need to load them again? (With Obspy it's almost instant)
I like Segpy a lot because it's much faster to read the actual samples of a trace than Obspy and it seems much easier to modify the samples and then save it, although I haven't tried it yet.
I would appreciate any help as I'm planning to upscale the use of these code from SEG-Y files with about ~300,000 traces to millions.