kernsuite / packaging

Use this repository to report issues with packages or request new packages
13 stars 4 forks source link

python3-casacore cannot find dysco #234

Closed phiadaarr closed 3 years ago

phiadaarr commented 3 years ago

When I do:

from casacore.tables import table
table("name.ms")

where name.ms is a dysco-compressed measurement set, I get the error:

RuntimeError: Shared library dyscostman not found in CASACORE_LDPATH or (DY)LD_LIBRARY_PATH
libcasa_dyscostman.so.5: cannot open shared object file: No such file or directory
libcasa_dyscostman.so: cannot open shared object file: No such file or directory
libdyscostman.so.5: cannot open shared object file: No such file or directory
libdyscostman.so: cannot open shared object file: No such file or directory

My docker file looks like this:

FROM kernsuite/base:dev
ENV DEBIAN_FRONTEND noninteractive
RUN docker-apt-install dysco python3-casacore

Do you have an idea what is going on? I found this issue: https://gitlab.rrz.uni-hamburg.de/hpc/lofar-build/-/issues/5 with this fix: https://gitlab.rrz.uni-hamburg.de/hpc/lofar-build/-/commit/0a38007638f9b311261e9e9b3289e97f2f8c77a3 Can this be integrated into KERN?

Athanaseus commented 3 years ago

Hi @phara92 thanks for reporting this. Is there a dscompress test ms that I can use to replicate this? that will also help in testing the fix. Thanks.

Not sure why I get a core dump trying to make my own: dscompress -afnormalization -truncgaus 2.5 -data-bit-rate 4 -weight-bit-rate 12 -column DATA data.ms

        bits per data val = 4
        bits per weight = 12
        distribution = Truncated Gaussian with sigma=2.5
        normalization = AF

Opening ms...
Replacing flagged values by NaNs...
Time taken: 00:00:00.758567
Validating MS ordering...
terminate called after throwing an instance of 'std::runtime_error'
  what():  This measurement set is not 'regular'; at table row 141, timeblock index 1, timeblock offset 5 the index for antenna2 is not the same as for previous timesteps. In other words, not all timesteps had the same baselines. This is required to be able to compress with Dysco.
Aborted (core dumped)
gijzelaerr commented 3 years ago

@phara92 is there a specific reason you are using kern:dev? Do you experience the same issue with KERN-6?

phiadaarr commented 3 years ago

@Athanaseus I do not know what happens there in your case. It looks like the MS is corrupted. But I really do not know what is going on.

@Athanaseus I do not know if I can share the MS I am trying to use with you. First I would need to ask the people who gave it to me.

@gijzelaerr Since I am a tool developer, I thought kern:dev is appropriate :) But the issue is also present on kern:latest.

gijzelaerr commented 3 years ago

I remeber now, try installing the dysco-dev package also in the docker container, that should install the missing SO symlinks. I don't remember how casacore and dysco are linked, but this casacore version is looking for dysco SO version5 while there is only a 2 in KERN-6. Now hope they are compatible.

phiadaarr commented 3 years ago

dysco-dev is not part of kern:latest. With kern:dev I get the following error now:

  File "/usr/lib/python3/dist-packages/casacore/tables/table.py", line 373, in __init__
    Table.__init__(self, tabname, lockopt, opt)
RuntimeError: Table DataManager error: Data Manager class DyscoStMan is not registered
  Check (DY)LD_LIBRARY_PATH matches the libraries used during the build of DyscoStMan
gijzelaerr commented 3 years ago

kernsuite/base:dev is for testing usage and should not be relied on by astronomers unless figuring out compatibility issues just before a release. kernsuite/base:latest points to kern-4, not sure why that is not properly updated. Made issue: https://github.com/kernsuite/packaging/issues/235

About that dysco error I'm also not sure what is going on, since both casacore and dysco are the latest versions in KERN-6. Maybe ask upstream at the dysco repo what could be wrong.

phiadaarr commented 3 years ago

To me it seems to be a build issue. But I will ask upstream anyways.

phiadaarr commented 3 years ago

André says:

There's probably something wrong with paths set, or the casacore version (or other dependency) used to compile dysco vs the one used for python3-casacore don't match.

And apparently does not feel responsible for this. Do you think we can figure this out on the KERN side?

gijzelaerr commented 3 years ago

Yeah this isn't very helpful and Andre doesn't care about proper management of SO numbers. Doing proper SO version managed is supposed to address this and would make it easier to pinpoint the issue.

@Athanaseus can you have a look at this? Maybe just do a rebuild of the latest casacore and dysco package and see if we can reproduce the error, maybe indeed somewhere a version got mixed up

meanwhile, it would help if we have a little MS that uses dysco.

tammojan commented 3 years ago

We're doing our best to make the software work for everyone, but it takes a lot of effort to keep everyone happy.

Casacore has a feature to find storage managers at runtime, in this case libdyscostman.so. For this to work, that library has to be installed, in the library search path (e.g. in /usr/local/lib or in some directory in LD_LIBRARY_PATH), and it has to be linked to the same casacore libraries as the application calling casacore (in this case python-casacore).

As for the little test-MS, here is one. small-dysco.MS.zip

aroffringa commented 3 years ago

That's not a very collaborative remark, @gijzelaerr . And also just irrelevant: this doesn't have to do with proper SO versions of Dysco. The Dysco library ABI has never changed. It's the environment Dysco lives in, which is what Kern should fix.

Of other packages, like aoflagger or wsclean, the ABI changes every release and even subreleases. Ole has a good solution for that, namely always depend the version of an executable on the exact same version of the lib. That works fine in Debian. If your system depends on me to do something with an SO version every time you release something, then indeed that's not going to work.

However, again, the system that Ole uses works much better. But also, for things like SO versions or certain cmake Debian rules that Ole knows a lot better than me, Ole keeps patches in the Debian repository. If you want proper SO versions, go ahead and do that for Kern.

gijzelaerr commented 3 years ago

just closing an issue is not very collaborative either. Most of the issues people have is with casacore and related libs having compatibility issues. for years I've tried to improve the situation by checking this and working together with upstream trying to improve the situation. Since I'm currently not paid to work in radio astronomy anymore, I'm doing this now in my spare time and I've lost the motivation to track this kind of issues.

aroffringa commented 3 years ago

Frankly I didn't just close it, I gave an answer pointing as best as I could at what was the problem. I agree it would have been nicer if I had said "I'm closing this ticket here because I think it's not a Dysco issue I can fix", but it was late and thought closing implied this, so my apologies for that (the "comment and close" button is also just too easy to click ;) ). But throwing random accusations at me is not going to improve my willingness to help you.

gijzelaerr commented 3 years ago

Yeah, I also could and should have brought the message differently, but I'm of the impression that SO versioning really is not seen as useful for managing dependencies at ASTRON, while Debian packages heavily relies on this. The Debian and KERN packages are almost the same, so maybe something is going wrong in the build procedure. Hopefully, @Athanaseus can find the time to look at this.

phiadaarr commented 3 years ago

I really do appreciate the effort by all of you :) :) :)

aroffringa commented 3 years ago

I think this is a different discussion. I can't really speak for ASTRON as a whole, but I'm open to discussions about how to help packaging of software, but I'm not aware of any issue with the setup of SO versions in other packages I maintain. Ole seems to have a system for it which works well.

Having a second version number going in parallel to the regular version nrs is a bit of a pain, certainly if it's required to increase whenever you do a release. We change ABIs all the time for some of the packages, and we can't keep up with that. Maybe a system that could work partly if is the SO version is set to the package version (this is done more often on Linux, but don't know if it is according to Debian rules). We always increase the package version for stable releases. That solves this therefore partly, i.e. one may install multiple libraries next to each other of different releases. It still wouldn't solve when distribution-specific patch releases are made in between our releases (which at least for wsclean/aoflagger has happened).

gijzelaerr commented 3 years ago

As far as I remember, Ole or I have not introduced any ABI changes with distro-specific patch releases. I understand the trouble, and it isn't easy. In a couple of weeks, I'll have a holiday coming up, and hopefully, I have a bit more time to look into this/these issues.

Op do 26 nov. 2020 om 13:27 schreef André Offringa <notifications@github.com

:

I think this is a different discussion. I can't really speak for ASTRON as a whole, but I'm open to discussions about how to help packaging of software, but I'm not aware of any issue with the setup of SO versions in other packages I maintain. Ole seems to have a system for it which works well.

Having a second version number going in parallel to the regular version nrs is a bit of a pain, certainly if it's required to increase whenever you do a release. We change ABIs all the time for some of the packages, and we can't keep up with that. Maybe a system that could work partly if is the SO version is set to the package version (this is done more often on Linux, but don't know if it is according to Debian rules). We always increase the package version for stable releases. That solves this therefore partly, i.e. one may install multiple libraries next to each other of different releases. It still wouldn't solve when distribution-specific patch releases are made in between our releases (which at least for wsclean/aoflagger has happened).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kernsuite/packaging/issues/234#issuecomment-734243342, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACPVJETO4QNPQ6WHJLKBO3SRY3QDANCNFSM4T464YCQ .

-- Gijs Molenaar http://pythonic.nl

Athanaseus commented 3 years ago

As for the little test-MS, here is one. small-dysco.MS.zip

Thanks @tammojan, it looks like the zipped file is empty though.

gijzelaerr commented 3 years ago

@phiadaarr can you see if this is solved with KERN-7? Thanks!

phiadaarr commented 3 years ago

No, it is still not working.

I use the following docker file now:

FROM kernsuite/base:7
ENV DEBIAN_FRONTEND noninteractive
RUN docker-apt-install dysco python3-casacore

Same errors.

gijzelaerr commented 3 years ago

@phiadaarr i'm surprised this still happens, since KERN-7 is a full rebuild of all packages with the latest versions. can you please give the exact commands you are running, the error and if possible the file you are trying to process, or preferably a small example file that triggers your error.

Athanaseus commented 3 years ago

Hi @phiadaarr, can you try:

RUN docker-apt-install dysco-dev python3-casacore

phiadaarr commented 3 years ago

With this

RUN docker-apt-install dysco-dev python3-casacore

I get Illegal instruction (core dumped).

can you please give the exact commands you are running, the error and if possible the file you are trying to process, or preferably a small example file that triggers your error.

I described that above. The measurement set I try to process is 5GBs big, so this is kind of sharable size-wise. I have to ask the person who shared the data with me, if I am allowed to share it with you for debugging purposes. I will let you know as soon as I have the answer.

Athanaseus commented 3 years ago

This is what I get with the ms @tammojan sent:

Python 3.8.5 (default, May 27 2021, 13:30:53) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from casacore.tables import table
>>> table('small-dysco.MS')
Successful readonly open of default-locked table small-dysco.MS: 24 columns, 168 rows
<casacore.tables.table.table object at 0x7fea90f05f30>

Dockerfile:

FROM kernsuite/base:7
ENV DEBIAN_FRONTEND noninteractive
RUN docker-apt-install dysco-dev python3-casacore

Docker run commands: docker build . -t dysco docker run -v /path/to/dysco/ms:/home -it dysco

Edit: MS link

phiadaarr commented 3 years ago

I cannot download this ms since it is on Slack and I do not have an account there. Could you share it in a form that I can access it?

Athanaseus commented 3 years ago

try: https://drive.google.com/file/d/13o3iLo8x2q3Xf8HblIUSunSs4P06qXEe/view?usp=sharing

phiadaarr commented 3 years ago

Thank you. I get the same error:

>>> from casacore.tables import table
>>> table("small-dysco.MS")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/casacore/tables/table.py", line 373, in __init__
    Table.__init__(self, tabname, lockopt, opt)
RuntimeError: Shared library dyscostman not found in CASACORE_LDPATH or (DY)LD_LIBRARY_PATH
libcasa_dyscostman.so.6: cannot open shared object file: No such file or directory
libcasa_dyscostman.so: cannot open shared object file: No such file or directory
libdyscostman.so.6: cannot open shared object file: No such file or directory
libdyscostman.so: cannot open shared object file: No such file or directory

And if I use dysco-dev, I get again:

Illegal instruction (core dumped)

My machine has the following CPU flags:

Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb cpb hw_pstate ssbd vmmcall bmi1 arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold

Maybe you assume during compilation something that is not present on that machine? I will try the same thing on a newer machine as well.

gijzelaerr commented 3 years ago

I get the same error with this dockerfile and the MS from the google drive:

FROM kernsuite/base:7                                                                
RUN docker-apt-install dysco python3-casacore casacore-dev             
ADD . /code                                                                          
RUN python3 -c "from casacore.tables import table; table('/code/small-dysco.MS')"    

But when do a docker-ap-install dysco-dev the error goes away. The dysco included in KERN uses SO version 2:

root@2e242a22407f:/# dpkg -L libdyscostman2 | grep so                                                                                    │
/usr/lib/libdyscostman.so.2        

Dysco has been uploaded 4 days after casacore, so it should be linked to the casacore version KERN-7.

I think the segmentation fault is caused by an issue with dysco and your specific MS, since there are no errors about missing symbols. Maybe an idea to manually compile dysco using debug flags or install the debugging symbols package (search for debug here https://kernsuite.info/faq/) and get a proper core dump to examine the situation.

gijzelaerr commented 3 years ago

about the machine CPU flags, we are quite conservative with our optimisation flags, so I don't expect this to be an issue.

phiadaarr commented 3 years ago

On a newer machine, everything works now. Thanks a lot!

Shall we dig into the issue with the older machine (it has an AMD Opteron(tm) Processor 6376) or leave it as it is?

gijzelaerr commented 3 years ago

Interesting, thanks for figuring that out. My guess is that the default dysco optimisation flags are then assuming a modern system. As far as I know, we don't touch the optimisation flags to avoid this kind of issue.

I think for now it Is just good enough that we triangulated the issue, no further investigation required. The optimisation is a bit of an issue with the Debian packages, since we don't want to deliver slow packages but also keep support for older architectures, (but not too old). The 6376 seems to come from 2012, so i guess we can label that 'too old' :)

phiadaarr commented 3 years ago

It is unclear to me if 2012 is already too old. But fortunately I do have an alternative machine. The original issue python3-casacore cannot find dysco is certainly resolved.