Closed trac2github closed 12 years ago
[megies] During instrument correction the data type of the numpy array is converted to float. You can check these things e.g. like {{{
from obspy.core import read st = read() tr = st[0] print tr.data.dtype # show data type print tr.data.itemsize # show number of bytes taken in memory by a single sample }}}
[dcdavemail@gmail.com] Hi Megies, both the raw data and the corrected data are of type float 64 and take 8 bytes for a single sample?
[megies] I can not think of any other reason. Sorry, I cannot help you without any reproducable example.
[anonymous] I attached the script I used, if you could try it I'd appreciate it.
[megies] I had to adapt some filenames/paths and am running into IOErrors with non existing files. Please submit a script with one specific case without any loops, I do not have the time to play around with complicated programs right now.
[anonymous] Hi again, Seems I was mistaken the raw data is float32 and the corrected float64. The increase in file size still seems a bit excessive though. If you're too busy it can wait as there is not much I can do about it today anyway. Thanks, David
[megies] Ok, I have had a look at it. However I can not see any problem here (actually the raw data is int32). Correct me anybody, but miniseed is using totally different encoding/compression algorithms for int/float data types so I guess it is not at all surprising to see varying compression efficiency.
I think you just have to live with this.
best, Tobias
[anonymous] ok thanks
[anonymous] what happens if you just write it back to int32 by append this to your script: {{{ for tr in st: tr.data = np.require(tr.data, np.int32) st.write('int32.mseed', 'MSEED') }}}
if the size of the file is still significant larger than its not just the conversion of int32 to float64 ...
[krischer] Hello.
You do not really want to that because any filtered/corrected/... data will most likely not be full integers anymore so you would corrupt your data.
I did not check, but should the incoming data not be int32?
And yes, MiniSEED does not pack float data at all and just writes the raw binary numbers to the file. Integer numbers on the other hand are (for "quiet" data at least) packed quite efficiently with a best case compression ratio of, I believe, almost 1:7 for STEIM2 compression. So a filesize increase of 200% (you have 400% because you store the data as float64) is still quite within bounds.
The filesize also depends, although to a lesser degree, on the record length. Larger record lengths will result in a smaller file because the header is written less often.
In your case I would just convert the data to float32 and store it with encoding 4 (float32^^).
Best wishes,
Lion
[anonymous] Lion,
I know ;) I just wanted to figure if he stores the data again in int32 (which was the original data type) if this results into significant larger files - if this is the case than its not the conversion of int/float which increases the file size - instead there is something different going on - e.g. change of sampling rate etc
Robert
[krischer] Heyhey,
ah ok. I didn't think of that possibility.
There are also some other things that could happen, like if there is a significant gap between the traces before they are merged with the interpolate option which would actually create new data.
I just tried the included example and it all seems fine to me. I don't believe we have an issue here. The data is actually decimated with a factor of two before being corrected but it is still stored as uncompressed float64.
So the filesize increase is due to two/three factors:
Best wishes,
Lion
[megies] I think we can close this..
[anonymous] Hi again, sorry I didn't get back sooner but its been a busy week. The data gaps are very small and storing the data as int32 reduces the filesize to 600K so I think Lion's summary is correct. thanks anyone who commented :) D
[megies] No problem. We're happy if !ObsPy keeps being useful for you. Keep in mind, you have any processing routines that could be useful to others, drop us a line.
best, Tobias
Hi, I have used obspy to correct some data. I get the data as a miniseed file through arcLink with a filesize of 1.8M. After merging the data and doing the correction, then decimating by a factor of 2 and filtering above 1Hz the filesize goes up to about 7.5M. Anyone know why this is. Thanks, David