gwastro / pycbc

Core package to analyze gravitational-wave data, find signals, and study their parameters. This package was used in the first direct detection of gravitational waves (GW150914), and is used in the ongoing analysis of LIGO/Virgo data.
http://pycbc.org
GNU General Public License v3.0
314 stars 351 forks source link

pycbc v2.0.1 version - banksim issue #3921

Closed MariaAssiduo closed 2 years ago

MariaAssiduo commented 2 years ago

Hello. I am using the pycbc 2.0.1 version, I git clone it in my CIT directory /home/maria.assiduo/gwastro2_pycbc/.

When running pycbc_banksim_skaymax, I get the following error:

Traceback (most recent call last):
  File "/home/maria.assiduo/gwastro2_pycbc/pycbc_Python3/bin/pycbc_splitbank", line 163, in <module>
    sngl_inspiral_table = lsctables.New(tabletype,columns=template_bank_table.columnnames)
  File "/home/maria.assiduo/gwastro2_pycbc/pycbc_Python3/lib/python3.9/site-packages/ligo/lw/lsctables.py", line 153, in New
    new.appendColumn(name)
  File "/home/maria.assiduo/gwastro2_pycbc/pycbc_Python3/lib/python3.9/site-packages/ligo/lw/table.py", line 719, in appendColumn
    raise ligolw.ElementError("invalid Column '%s' for Table '%s'" % (name, self.Name))
ligo.lw.ligolw.ElementError: invalid Column 'process_id' for Table 'sngl_inspiral'
2022-01-20 04:03:38,880 Splitting injection file
Traceback (most recent call last):
  File "/home/maria.assiduo/gwastro2_pycbc/pycbc_Python3/bin/pycbc_splitbank", line 163, in <module>
    sngl_inspiral_table = lsctables.New(tabletype,columns=template_bank_table.columnnames)
  File "/home/maria.assiduo/gwastro2_pycbc/pycbc_Python3/lib/python3.9/site-packages/ligo/lw/lsctables.py", line 153, in New
    new.appendColumn(name)
  File "/home/maria.assiduo/gwastro2_pycbc/pycbc_Python3/lib/python3.9/site-packages/ligo/lw/table.py", line 719, in appendColumn
    raise ligolw.ElementError("invalid Column '%s' for Table '%s'" % (name, self.Name))
ligo.lw.ligolw.ElementError: invalid Column 'process_id' for Table 'sim_inspiral'

I need help to understand how to overcome the problem. Thank you in advance.

titodalcanton commented 2 years ago

@MariaAssiduo can you point me to the banksim configuration or command line that you are using?

MariaAssiduo commented 2 years ago

@titodalcanton I am very sorry, I may have deleted the directory of the test, however my pycbc copy is in the directory /home/maria.assiduo/gwastro2_pycbc/

The configuration file was of this kind

[inspinj]
f-lower = 15
i-distr = uniform 
l-distr = random 
t-distr = uniform

min-mass1 = 50
max-mass1 = 250
m-distr = componentMass
disable-milkyway =
min-mass2 = 50
max-mass2 =  250
min-distance = 1000
min-mtotal = 50
max-mtotal = 500
waveform = IMRPhenomXPHM

enable-spin =
min-spin1 = 0.
max-spin1 = 0.99
min-spin2 = 0.
max-spin2 = 0.99
max-distance = 1000000
d-distr = uniform
gps-start-time = 2000000000
gps-end-time =   2000020000
time-interval = 0.
time-step = 1.0
seed = 3

[executables]
banksim = /home/maria.assiduo/gwastro2_pycbc/pycbc_Python3/bin/pycbc_banksim_skymax

[workflow]
log-path = /home/maria.assiduo/etc
;use-gpus =          
bank-file = SBANK_COMBINED-high-mass_SEOB.xml.gz
injections-per-job = 50
templates-per-job = 100
accounting-group = ligo.dev.o3.cbc.em.mbta

[banksim]
processing-scheme = cpu
asd-file = /home/maria.assiduo/etc/ASD_L1_1241427800_2000.txt 
template-approximant = SEOBNRv4_ROM
template-start-frequency = 14
signal-sample-rate = 16384
signal-start-frequency = 15
filter-low-frequency = 14
filter-sample-rate = 4096
filter-signal-length = 32
signal-approximant = IMRPhenomXPHM
sky-maximization-method = precessing

and the command lines, in the directory where the banksim config file is present, I usually type are

1) source /home/maria.assiduo/gwastro2_pycbc/pycbc_Python3/bin/activate
2) pycbc_make_banksim --config file.ini
3) ./submit.sh

For my run, I get the aforementioned error message soon after the passage 2). Thanks in advance

titodalcanton commented 2 years ago

The error points to a problem reading one of the XML files (the bank or the injections, or both) so I wanted to have a look at those files and try to reproduce the problem myself. It would be useful if you could rerun the banksim, or look at your ~/.zfs/snapshot directory and tell me where to find the old run, so I can access the XML files.

MariaAssiduo commented 2 years ago

@titodalcanton I have reproduced exactly the same test, you can have a look at it here --> /home/maria.assiduo/test_gwastro2_pycbc/

Thank you

titodalcanton commented 2 years ago

Ok, I can reproduce the issue, and there are definitely things to fix in pycbc_splitbank, however I am not sure if the underlying problem is in sbank, PyCBC, or ligo.lw yet.

One odd thing about your bank is that it is named .xml.gz even though it is not compressed. This is not the cause of the problem, but you may want to rename the file just to avoid confusion.

a-r-williamson commented 2 years ago

By the looks of things I've hit sort of the inverse of this issue when reading in an xml bank with the error being

Traceback (most recent call last):
  File "/home/andrew.williamson/.conda/envs/pycbc-py37/bin/pycbc_multi_inspiral", line 7, in <module>
    exec(compile(f.read(), __file__, 'exec'))
  File "/home/andrew.williamson/.conda/envs/pycbc-py37/src/pycbc/bin/pycbc_multi_inspiral", line 406, in <module>
    approximant=opt.approximant)
  File "/home/andrew.williamson/.conda/envs/pycbc-py37/src/pycbc/pycbc/vetoes/bank_chisq.py", line 188, in __init__
    approximant=approximant, **kwds)
  File "/home/andrew.williamson/.conda/envs/pycbc-py37/src/pycbc/pycbc/waveform/bank.py", line 679, in __init__
    parameters=parameters, **kwds)
  File "/home/andrew.williamson/.conda/envs/pycbc-py37/src/pycbc/pycbc/waveform/bank.py", line 275, in __init__
    columns=parameters)
  File "/home/andrew.williamson/.conda/envs/pycbc-py37/src/pycbc/pycbc/io/record.py", line 1187, in from_ligolw_table
    for row in table]
  File "/home/andrew.williamson/.conda/envs/pycbc-py37/src/pycbc/pycbc/io/record.py", line 1187, in <listcomp>
    for row in table]
  File "/home/andrew.williamson/.conda/envs/pycbc-py37/src/pycbc/pycbc/io/record.py", line 1186, in <genexpr>
    for col,dt in columns.items())
AttributeError: 'SnglInspiral' object has no attribute 'process:process_id'

This comes from the fact that the SnglInspiral table has process:process_id as a valid column, but the attribute for a SnglInspiral row is process_id. Can work around it with e.g. a change at line 1184 of pycbc/io/record.py :

-                [tuple(getattr(row, col) if dt != 'ilwd:char'
+                [tuple(getattr(row, col.split(':')[-1]) if dt != 'ilwd:char'

I'd be surprised if this doesn't crop up elsewhere.

spxiwh commented 2 years ago

@a-r-williamson Can you point to a bank where this is happening?

a-r-williamson commented 2 years ago

On CIT: /home/andrew.williamson/pygrb_tests/pre_O4/pycbc_multi_inspiral_test_220120/bank_veto_bank.xml

titodalcanton commented 2 years ago

@a-r-williamson I kind of hit that problem today as well while debugging Maria's issue (but I did not have time to follow it up properly). I was not sure why this was not being caught by the PyCBC Live test (which uses the O2 bank in XML format) but I think I just saw it! It seems we explicitly do the workaround you suggest, but only in pycbc_coinc_bank2hdf: https://github.com/gwastro/pycbc/blob/master/bin/bank/pycbc_coinc_bank2hdf#L44

titodalcanton commented 2 years ago

Scratch my previous comment, I do in fact get the same error when trying to load the O2 XML bank. Not sure why the test does not fail!

titodalcanton commented 2 years ago

Ok I see what is happening now. I can avoid the error by passing the parameters kwarg to TemplateBank's constructor, and only request the masses and spins (which is what pycbc_coinc_bank2hdf does). This prevents the TemplateBank code from trying to read/parse the problematic process:process_id field.

a-r-williamson commented 2 years ago

Yes that makes sense. I can create a PR with my above suggested change if that looks like the best way to handle this, but before that I'll see if I can spot other places where this might occur.

MariaAssiduo commented 2 years ago

@a-r-williamson Hi, I made the changes you suggestend in the script record.py

-                [tuple(getattr(row, col) if dt != 'ilwd:char'
+                [tuple(getattr(row, col.split(':')[-1]) if dt != 'ilwd:char'

of my pycbc copy in the CIT directory /home/maria.assiduo/gwastro2_pycbc/pycbc/pycbc/io/

But I still can't run banksim without that error occurring

ligo.lw.ligolw.ElementError: invalid Column 'process_id' for Table 'sim_inspiral'
ligo.lw.ligolw.ElementError: invalid Column 'process_id' for Table 'sngl_inspiral'

You can have a look to the test here /home/maria.assiduo/test_gwastro6/BK_pycbc2/XPHMx2/

What I missed? Thank you in advance.

a-r-williamson commented 2 years ago

@MariaAssiduo I'm sorry if I've caused confusion by mentioning the error I had encountered. The suggested changes I posted were to address that error, which is different to yours (but caused by the same underlying changes in ligo.lw). @titodalcanton has opened https://github.com/gwastro/pycbc/pull/3922 to fix the error you identified. The traceback you posted shows that the calls from pycbc_splitbank aren't leading to the io/record.py functions at all, so changing that code won't do anything to help.

titodalcanton commented 2 years ago

@MariaAssiduo the error you originally reported here should now be fixed on master by #3922, so I am going to close this issue. Please install the latest PyCBC master branch and try again.