jpjones76 / SeisIO.jl

Julia language support for geophysical time series data
http://seisio.readthedocs.org
Other
47 stars 20 forks source link

ungap! causes segmentation fault #32

Closed tclements closed 4 years ago

tclements commented 4 years ago

This is a similar issue to #29. With this file, read_data and merge! work well

S = read_data("mseed","CIRIO__BHZ___2017261.ms")
SeisData with 1 channels (1 shown)
    ID: CI.RIO..BHZ                        
  NAME: CI.RIO..BHZ                        
   LOC: 0.0 N, 0.0 E, 0.0 m                
    FS: 40.0                               
  GAIN: 1.0                                
  RESP: a0 1.0, f0 1.0, 0z, 0p             
 UNITS:                                    
   SRC:                                    
  MISC: 0 entries                          
 NOTES: 0 entries                          
     T: 2017-09-18T00:00:00.020 (34 gaps)  
     X: -3.640e+02                         
        +3.700e+01                         
            ...                            
        +1.703e+03                         
        (nx = 3459296)                     
     C: 0 open, 0 total

merge!(S)
S
SeisData with 1 channels (1 shown)
    ID: CI.RIO..BHZ                        
  NAME: CI.RIO..BHZ                        
   LOC: 0.0 N, 0.0 E, 0.0 m                
    FS: 40.0                               
  GAIN: 1.0                                
  RESP: a0 1.0, f0 1.0, 0z, 0p             
 UNITS:                                    
   SRC:                                    
  MISC: 0 entries                          
 NOTES: 1 entries                          
     T: 2017-09-18T00:00:00.020 (4 gaps)   
     X: -3.640e+02                         
        +3.700e+01                         
            ...                            
        +1.703e+03                         
        (nx = 3455964)                     
     C: 0 open, 0 total

When calling ungap!(S) I get a segmenation fault

double free or corruption (!prev)

signal (6): Aborted
in expression starting at REPL[14]:1

Testing ungap!(S) on a different machine gives a different seg fault

malloc_consolidate(): invalid chunk size

signal (6): Aborted
in expression starting at REPL[18]:1

I believe the seg fault is happening in the resize! line at the end of gapfill!.

jpjones76 commented 4 years ago

Looking into this as top priority. Seg faults are very serious and I haven't seen one slip through to SeisIO master for years. I'll have a report later today or tomorrow but am quite busy this afternoon. Hopefully there's a simple fix rather than rolling back to the memory hog that was splice!.

jpjones76 commented 4 years ago

Um...

julia> S = read_data("/home/josh/Downloads/CIRIO__BHZ___2017261.mseed")
SeisData with 1 channels (1 shown)
    ID: CI.RIO..BHZ                        
  NAME: CI.RIO..BHZ                        
   LOC: 0.0 N, 0.0 E, 0.0 m                
    FS: 40.0                               
  GAIN: 1.0                                
  RESP: a0 1.0, f0 1.0, 0z, 0p             
 UNITS:                                    
   SRC:                                    
  MISC: 0 entries                          
 NOTES: 0 entries                          
     T: 2017-09-18T00:00:00.020 (34 gaps)  
     X: -3.640e+02                         
        +3.700e+01                         
            ...                            
        +1.703e+03                         
        (nx = 3459296)                     
     C: 0 open, 0 total

julia> merge!(S)

julia> S.t
1-element Array{Array{Int64,2},1}:
 [1 1505692800019500; 3271530 4300; … ; 3284294 -500; 3455964 0]

julia> S.t[1]
6×2 Array{Int64,2}:
       1  1505692800019500
 3271530              4300
 3275238            899500
 3280754               500
 3284294              -500
 3455964                 0

julia> ungap!(S)

julia> S
SeisData with 1 channels (1 shown)
    ID: CI.RIO..BHZ                        
  NAME: CI.RIO..BHZ                        
   LOC: 0.0 N, 0.0 E, 0.0 m                
    FS: 40.0                               
  GAIN: 1.0                                
  RESP: a0 1.0, f0 1.0, 0z, 0p             
 UNITS:                                    
   SRC:                                    
  MISC: 0 entries                          
 NOTES: 2 entries                          
     T: 2017-09-18T00:00:00.020 (0 gaps)   
     X: -3.640e+02                         
        +3.700e+01                         
            ...                            
        +0.000e+00                         
        (nx = 3456000)                     
     C: 0 open, 0 total

I'm unable to replicate the segfault.

jpjones76 commented 4 years ago

Oh, hold on...there's something I don't like about ungap!, but it's not resize! ... AFAIK that can't cause segmentation faults. I'm pushing a fix to master that switches the order of resize! and unsafe_copyto!, and changes the latter to copyto! throughout. That prevents any possible buffer overflows from the "unsafe" invocation, which I know can trigger seg faults easily. I see nothing else in ungap! that can. Please let me know if this fixes the issue on your machine.

tclements commented 4 years ago

Both seg faults occurred on Intel CPUs (i7-8850H and Xeon Platinum 8124M server) with Ubuntu 19.10 and 18.04, respectively. Working as expected after the update on both machines. Must have been something with the combination of unsafe_copyto and resize!.

jpjones76 commented 4 years ago

Whew, ok. I'm going to leave this open for a week or two because I couldn't reproduce the error, so that you can add more examples if it happens again. I suspect that it's resolved, though.

tclements commented 4 years ago

Great, I'm about to go through 50 TB of data. If I don't see any issues, we can safely close.

jpjones76 commented 4 years ago

Have you encountered this bug at all since Nov. 25? I want to leave this open for another week, because segmentation faults are quite serious, but I'll close it next week if you don't see it again.

tclements commented 4 years ago

Went through all the BH? data for southern California since 2000 - no segmentation faults. Think we can safely close.