ratt-ru / meqtrees

A library for implementing radio astronomical Measurement Equations
http://meqtrees.net
10 stars 2 forks source link

gdbm fatal: malloc error #861

Open IanHeywood opened 9 years ago

IanHeywood commented 9 years ago

Calibrating with calico-wsrt-tens.py via the browser, the process crashes with a "gdbm fatal: malloc error" message printed to the terminal. A system monitor shows that MeqTrees is only using about a third of my system RAM when this happens. The same MS with a slightly shallower sky model was processing to completion yesterday, now the crash happens every time with the deeper model and some updated flags.

I notice this issue: https://github.com/ska-sa/meqtrees/issues/687 which I suspect might be related.

Any tips gratefully received. Cheers.

twillis449 commented 9 years ago

Hi Ian If you built MeqTrees from source, did you have qdbm and qdbm development libraries installed? I believe that the various configure scripts etc are then supposed to select or link against qdbm if you have it installed. If you just installed the pre-built ubuntu packages I would assume that the qdbm dependency is taken care of for you.

IanHeywood commented 9 years ago

Thanks Tony.

I typically install the binary versions to get the dependencies, then check out the source and rebuild.

I installed libqdbm-dev (which was apparently missing) and rebuilt, but the problem persists.

Cheers.

IanHeywood commented 9 years ago

Correction: hadn't "git pull"ed in a while so I did that and rebuilt. Now compilation fails at 94% with a flurry of blitz errors.

o-smirnov commented 9 years ago

You need to update your blitz package. There was some package renaming going on so you may need to remove the old packages first.

On Tue, 16 Jun 2015 09:21 IanHeywood notifications@github.com wrote:

Correction: hadn't "git pull"ed in a while so I did that and rebuilt. Now compilation fails at 94% with a flurry of blitz errors http://pastebin.com/JXaL1nwQ.

— Reply to this email directly or view it on GitHub https://github.com/ska-sa/meqtrees/issues/861#issuecomment-112318818.

o-smirnov commented 9 years ago

Also, could you try removing libgdbm-dev before rebuilding, to make sure it links against qdbm instead?

I remember some worrying issue about all this, but the details escape me at the moment...

Also, any reason you're not using calico-stefcal instead?

gijzelaerr commented 9 years ago

make sure you have blitz 0.10 installed, not 0.11 nor 0.9.

IanHeywood commented 9 years ago

Thanks for the pointers. I'll try again when I'm back in the office tomorrow. As for calico-stefcal.py, it fails to select any data from this MS for reasons I've yet to figure out.

o-smirnov commented 9 years ago

Pretty sure it's this: https://github.com/ska-sa/meqtrees-timba/issues/1. How big are your fmep files from the previous (successful) run? Once they get over 2Gb, boom...

This is why we switched to qdbm in the first place, but now I remember that they treacherously dropped hovel.h (the gdbm compatibility API) from the qdbm packages in Ubuntu 12 (see this: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=620550) so we went back to gdbm.

The solution is to send @gijzelaerr a beer so that he builds a custom qdbm package (http://sourceforge.net/projects/qdbm/) that does include hovel.h, and rebuilds meqtrees-timba against it.

IanHeywood commented 9 years ago

That could be it, although the fmep that remains post-crash is only 525 MB. I guess I'll try to figure out how to get Stefcal to swallow the MS in the meantime.

@gijzelaerr what's your preference?

Cheers.

gijzelaerr commented 9 years ago

Experience learns that the repackaging of Ubuntu packages is not a good idea. Why not convince people upstream @ debian that we need hovel.h?

gijzelaerr commented 9 years ago

or why not bundle hovel.h with meqtrees-timba

IanHeywood commented 9 years ago

@gijzelaerr I'm buying you a beer anyway.

o-smirnov commented 9 years ago

Convincing the Debian folks is not going to solve Ian's immediate problem... and it's not just hovel.h, it's the gdbm_* functions in libqdbm.a that they've disabled.

I suppose the most foolproof solution is to put the entire qdbm source in TImba, build it, and link against it statically. Then we don't have a problem conflicting with Debian packages or anything like that.