madminer-tool / madminer-workflow

Madminer complete cloud-based analysis
MIT License
4 stars 4 forks source link

Madminer example is incompatible with singularity #5

Closed khurtado closed 4 years ago

khurtado commented 5 years ago

Hello,

I'm trying to test this example in REANA + HTCondor + Singularity and noticed some failures that I just wanted to report.

  1. The steps assume the starting working directory will be /home, when calling code/ (example). So, each script either needs some cd /home at the beginning or using full paths

  2. Steps that try to create directories in /home like here won't succeed unless the container is entered as root (docker case, but not singularity) , which result in Read-only file system errors like below. It would be great if the workflow could be adapted so that the directories created were done in the relative path of the working directory in the container (so it is up to the container technology invocation to make sure that relative path has write access), like in this example

Traceback (most recent call last):
  File "code/configurate.py", line 93, in <module>
    miner.save('/home/data/madminer_example.h5')
  File "/usr/local/lib/python2.7/dist-packages/madminer/core.py", line 527, in save
    create_missing_folders([os.path.dirname(filename)])
  File "/usr/local/lib/python2.7/dist-packages/madminer/utils/various.py", line 55, in create_missing_folders
    os.makedirs(folder)
  File "/usr/lib/python2.7/os.py", line 157, in makedirs
    mkdir(name, mode)
OSError: [Errno 30] Read-only file system: '/home/data'
cp: cannot stat '/home/data/*.h5': No such file or directory
[Error] Execution failed with error code: 1
lukasheinrich commented 5 years ago

hi @khurtado can you just try replacing /home/extract with /tmp/extract and see if that works?

khurtado commented 5 years ago

@lukasheinrich: Changing /home/extract->/tmp/extractin the workflow steps didn't work. The problem is that the directories are not only defined in the yaml files, but are hardcoded in the python scripts that come inside the image. The steps also assume the starting directory is /home, so calls to relative paths there like here fail.

See some examples below: https://github.com/scailfin/workflow-madminer/blob/master/docker/docker-madminer-physics/code/delphes.py#L82 https://github.com/scailfin/workflow-madminer/blob/master/docker/docker-madminer-physics/code/configurate.py#L93 https://github.com/scailfin/workflow-madminer/blob/master/docker/docker-madminer-physics/code/generate.py#L63-L79

lukasheinrich commented 5 years ago

hi this will require a rebuild of the image, i'm not sure how pervasive it is but usually being able to change this using a env variable i.e. replacing with os.environ['MADMINER_DATA'] or similar.

khurtado commented 5 years ago

@lukasheinrich I agree using environment variables on the scripts and settings those in the workflows makes sense. Is there anybody in the NYU scailfin group maintaining this example be willing to make the changes and rebuilding image for that purpose?

cranmer commented 5 years ago

Normally this wold be @irinaespejo, but I'm not sure if she is available right now. Thank you for testing and finding the issues. I'll see if I can find someone to help. Possibly @heikomuller @alexanderheld ?

cranmer commented 5 years ago

It's lines like this that are the problem, right? https://github.com/scailfin/workflow-madminer/blob/master/docker/docker-madminer-ml/code/configurate_ml.py#L65

khurtado commented 5 years ago

@cranmer Correct!

irinaespejo commented 5 years ago

Hi @khurtado, thank you for your detailed report and apologies for the late reply, I've been away for a while. You're very right in the problem with the paths using Singularity, Lukas' suggestion is helpful and I'll go that way. I'm working on it right now so that we don't have problems like this in the future. I'll get back to you when it's done, thanks for the patience!

khurtado commented 5 years ago

@irinaespejo Thank you!

irinaespejo commented 5 years ago

Hi @khurtado I've made changes and I've tried them myself using Singularity and it works. I decided the easiest was to put everything on a separate folder called /madminer to avoid permission problems on /home. Let me know if you have any problems and what you think about the solution. Thanks for your interest.

khurtado commented 5 years ago

@irinaespejo Awesome! Thank you, I will test this week and let you know how things go for me.

khurtado commented 5 years ago

Hi @irinaespejo I tested today, but got errors due to /madminer/data being in read-only mode.

Traceback (most recent call last):
  File "/madminer/code/configurate.py", line 93, in <module>
    miner.save('/madminer/data/madminer_example.h5')
  File "/usr/local/lib/python2.7/dist-packages/madminer/core.py", line 527, in save
    create_missing_folders([os.path.dirname(filename)])
  File "/usr/local/lib/python2.7/dist-packages/madminer/utils/various.py", line 55, in create_missing_folders
    os.makedirs(folder)
  File "/usr/lib/python2.7/os.py", line 157, in makedirs
    mkdir(name, mode)
OSError: [Errno 30] Read-only file system: '/madminer/data'
cp: cannot stat '/madminer/data/*.h5': No such file or directory
[Error] Execution failed with error code: 1
irinaespejo commented 5 years ago

Hi @khurtado, sad to hear that. I can't reproduce your error, could you post the commands you're using please? Also, you were using REANA + HTCondor + Singularity, does that still apply? Thanks!

khurtado commented 5 years ago

@irinaespejo I'm running it via VC3 with REANA + HTCondor + Singularity, but I get similar results executing yadage-run alone in the following way:

as a user:

export PACKTIVITY_CONTAINER_RUNTIME=singularity
export SINGULARITY_CACHEDIR="/tmp/$(whoami)/singularity"
export LC_ALL=en_US.utf-8
export LANG=en_US.utf-8
mkdir demo; cd demo
git clone https://github.com/scailfin/workflow-madminer
cd workflow-madminer/example-full
yadage-run   workdir workflow.yml  -p inputfile='"inputs/input.yml"'  -p njobs="6"  -p ntrainsamples="2"  -d initdir=$PWD --visualize

Once the above fails, the file in workdir/configurate/_packtivity/configurate.run.log has the following:

2019-09-09 16:58:09,580 | pack.configurate.run |   INFO | starting file logging for topic: run
2019-09-09 16:58:16,705 | pack.configurate.run |   INFO | inputfile:   /home/khurtado/demos/workflow-madminer/example-full/inputs/input.yml
2019-09-09 16:58:16,710 | pack.configurate.run |   INFO | Traceback (most recent call last):
2019-09-09 16:58:16,710 | pack.configurate.run |   INFO | File "/madminer/code/configurate.py", line 93, in <module>
2019-09-09 16:58:16,711 | pack.configurate.run |   INFO | miner.save('/madminer/data/madminer_example.h5')
2019-09-09 16:58:16,711 | pack.configurate.run |   INFO | File "/usr/local/lib/python2.7/dist-packages/madminer/core.py", line 527, in save
2019-09-09 16:58:16,711 | pack.configurate.run |   INFO | create_missing_folders([os.path.dirname(filename)])
2019-09-09 16:58:16,711 | pack.configurate.run |   INFO | File "/usr/local/lib/python2.7/dist-packages/madminer/utils/various.py", line 55, in create_missing_folders
2019-09-09 16:58:16,712 | pack.configurate.run |   INFO | os.makedirs(folder)
2019-09-09 16:58:16,712 | pack.configurate.run |   INFO | File "/usr/lib/python2.7/os.py", line 157, in makedirs
2019-09-09 16:58:16,712 | pack.configurate.run |   INFO | mkdir(name, mode)
2019-09-09 16:58:16,712 | pack.configurate.run |   INFO | OSError: [Errno 30] Read-only file system: '/madminer/data'
2019-09-09 16:58:16,857 | pack.configurate.run |   INFO | cp: cannot stat '/madminer/data/*.h5': No such file or directory

A single singularity command with the error would be:

$ singularity exec -C  -B /home:/home --pwd /tmp/_sing_home_X02HH2/f6BoUL -H /tmp/_sing_home_X02HH2 docker://madminertool/docker-madminer-physics:latest sh -c 'mkdir /madminer/data'
mkdir: cannot create directory '/madminer/data': Read-only file system
khurtado commented 5 years ago

Hi @irinaespejo, did you get a chance to look into this? Let me know if there is anything else you need to reproduce the problem. Thanks!

irinaespejo commented 5 years ago

Hi @khurtado, sorry I've been a bit busy. I was able to reproduce your error. I have an idea of what might work, I'll let you know how that turns out. Thank you!

irinaespejo commented 5 years ago

Hi @khurtado, could you try again the commands you posted and see if you still have the error? Thanks!

khurtado commented 5 years ago

Hi @irinaespejo . Still the same error. I made sure to clean the singularity cache. Has anything changed in the code, though? I haven't noticed any new commit in this repo since Sep 17/18.

2019-09-24 14:12:29,065 | pack.configurate.run |   INFO | File "/usr/local/lib/python2.7/dist-packages/madminer/utils/various.py", line 55, in create_missing_folders
2019-09-24 14:12:29,065 | pack.configurate.run |   INFO | os.makedirs(folder)
2019-09-24 14:12:29,065 | pack.configurate.run |   INFO | File "/usr/lib/python2.7/os.py", line 157, in makedirs
2019-09-24 14:12:29,065 | pack.configurate.run |   INFO | mkdir(name, mode)
2019-09-24 14:12:29,066 | pack.configurate.run |   INFO | OSError: [Errno 30] Read-only file system: '/madminer/data'
2019-09-24 14:12:29,167 | pack.configurate.run |   INFO | cp: cannot stat '/madminer/data/*.h5': No such file or directory
khurtado commented 5 years ago

Hi @irinaespejo . Have you had a chance to look into this? Let me know if there is anything I can help with.

khurtado commented 5 years ago

@irinaespejo Just ping about this to keep the thread alive :)

khurtado commented 5 years ago

Hi @irinaespejo

I'm trying to execute interactively. I get things running up to the combine step, but then sampling gives me the error below. Have you seen this? I had to revert to madminer to 0.5.0, because otherwise, delphes would complain about systematics.

EDIT: Oh, it seems the combine script only ended up copying the first delphes file in the list. Why wasn't combine_and_shuffle used to combine all delphes files?

$ echo $data_file
/reana/users/00000000-0000-0000-0000-000000000000/workflows/test/combine/combined_delphes.h5
$ echo $input_file
/reana/users/00000000-0000-0000-0000-000000000000/workflows/test/inputs/input.yml
$ python configurate_ml.py 1 $data_file $input_file
['sally', 'alices', 'alice']
12:17 madminer.analysis    INFO    Loading data from /reana/users/00000000-0000-0000-0000-000000000000/workflows/test/combine/combined_delphes.h5
12:17 madminer.analysis    WARNING Inconsistent event numbers in HDF5 file! Please recalculate them by calling combine_and_shuffle(recalculate_header=True).
12:17 madminer.analysis    INFO    Found 2 parameters
12:17 madminer.analysis    INFO    Did not find nuisance parameters
12:17 madminer.analysis    INFO    Found 6 benchmarks, of which 6 physical
12:17 madminer.analysis    INFO    Found 2 observables
12:17 madminer.analysis    INFO    Found 5823 events
12:17 madminer.analysis    INFO      982 signal events sampled from benchmark morphing_basis_vector_4
12:17 madminer.analysis    INFO    Found morphing setup with 6 components
12:17 madminer.analysis    INFO    Did not find nuisance morphing setup
sampling from method  sally
12:17 madminer.sampling    INFO    Extracting training sample for local score regression. Sampling and score evaluation according to sm
12:17 madminer.sampling    INFO    Starting sampling serially
12:17 madminer.sampling    INFO    Sampling from parameter point 1 / 1
Traceback (most recent call last):
  File "configurate_ml.py", line 189, in <module>
    filename=method+'_train'
  File "/usr/local/lib/python2.7/dist-packages/madminer/sampling.py", line 323, in sample_train_local
    double_precision=double_precision,
  File "/usr/local/lib/python2.7/dist-packages/madminer/sampling.py", line 1400, in _sample
    double_precision=double_precision,
  File "/usr/local/lib/python2.7/dist-packages/madminer/sampling.py", line 1509, in _sample_set
    generated_close_to=None if not sample_only_from_closest_benchmark else theta_value_sampling,
  File "/usr/local/lib/python2.7/dist-packages/madminer/analysis.py", line 331, in xsecs
    generated_close_to=generated_close_to,
  File "/usr/local/lib/python2.7/dist-packages/madminer/analysis.py", line 151, in event_loader
    return_sampling_ids=return_sampling_ids,
  File "/usr/local/lib/python2.7/dist-packages/madminer/utils/interfaces/madminer_hdf5.py", line 243, in madminer_event_loader
    this_observations = this_observations[cut]
IndexError: boolean index did not match indexed array along dimension 0; dimension is 3494 but corresponding boolean dimension is 982
khurtado commented 5 years ago

@irinaespejo Just so you know, the following changes work for me with singularity. I still need to check with shifter.

https://github.com/khurtado/workflow-madminer/commit/c4b3ac66820a3fe676c2d479f958346897640c20

Sinclert commented 4 years ago

Closing issue, after confirmation with @khurtado over Slack.