umami-hep / umami-preprocessing

UPP: Umami PreProcessing
1 stars 35 forks source link

Slow Built-in UPP Plotting #57

Open AndriusVaitkus97 opened 9 months ago

AndriusVaitkus97 commented 9 months ago

Built-in plotting with UPP runs really slow for me. I ran preprocess --config configs/xbb-avaitkus.yaml --plots on this commit: https://github.com/umami-hep/umami-preprocessing/commit/7099a2980baade4f1cc7a98ebc2538f3d6ebd9c0, on the train file with 60M jets. Below is the log file.

CPU count: 39
Moved dir, now in: /share/rcifdata/avaitkus/umami-preprocessing/upp
Activated environment upp_dev
CUDA_VISIBLE_DEVICES: 
Running preprocessing script...
INFO     Starting preprocessing...                                                                                      
INFO     Start time: 2024-02-09 16:17:38                                                                                
INFO     Copying config to /share/lustre/avaitkus/samples/preprocessed/2024-02-01-p5981/output/xbb-avaitkus_train.yaml  
WARNING:puma: Histogram is empty.
WARNING:puma: Histogram is empty.
WARNING:puma: Histogram is empty.
WARNING:puma: Histogram is empty.
WARNING:puma: Histogram is empty.
WARNING:puma: Histogram is empty.
WARNING:puma: Histogram is empty.
INFO     Saved plot /share/lustre/avaitkus/samples/preprocessed/2024-02-01-p5981/output/plots/initial_pt.png            
WARNING:puma: Histogram is empty.
WARNING:puma: Histogram is empty.
WARNING:puma: Histogram is empty.
WARNING:puma: Histogram is empty.
WARNING:puma: Histogram is empty.
WARNING:puma: Histogram is empty.
WARNING:puma: Histogram is empty.
INFO     Saved plot /share/lustre/avaitkus/samples/preprocessed/2024-02-01-p5981/output/plots/initial_ptlow.png         
WARNING:puma: Histogram is empty.
WARNING:puma: Histogram is empty.
WARNING:puma: Histogram is empty.
WARNING:puma: Histogram is empty.
WARNING:puma: Histogram is empty.
WARNING:puma: Histogram is empty.
WARNING:puma: Histogram is empty.
INFO     Saved plot /share/lustre/avaitkus/samples/preprocessed/2024-02-01-p5981/output/plots/initial_abs_eta.png       
WARNING:puma: Histogram is empty.
WARNING:puma: Histogram is empty.
WARNING:puma: Histogram is empty.
WARNING:puma: Histogram is empty.
WARNING:puma: Histogram is empty.
WARNING:puma: Histogram is empty.
WARNING:puma: Histogram is empty.
INFO     Saved plot /share/lustre/avaitkus/samples/preprocessed/2024-02-01-p5981/output/plots/initial_mass.png          
INFO     Saved plot /share/lustre/avaitkus/samples/preprocessed/2024-02-01-p5981/plots/train_pt.png                     
INFO     Saved plot /share/lustre/avaitkus/samples/preprocessed/2024-02-01-p5981/plots/train_ptlow.png                  
INFO     Saved plot /share/lustre/avaitkus/samples/preprocessed/2024-02-01-p5981/plots/train_abs_eta.png                
INFO     Saved plot /share/lustre/avaitkus/samples/preprocessed/2024-02-01-p5981/plots/train_mass.png                   
INFO     ------------------------------------- Finished Preprocessing! --------------------------------------           
INFO     End time: 2024-02-09 23:41:11                                                                                  
INFO     Elapsed time: 7:23:33                                                                                          

As you can see it took me over 7 hours. @zcapjdb has a similar issue

samvanstroud commented 9 months ago

Can you check whether it's the initial or final hists that are slowing things down? Tagging @IvanOleksiyuk in case he has any ideas

AndriusVaitkus97 commented 9 months ago

Can you check whether it's the initial or final hists that are slowing things down? Tagging @IvanOleksiyuk in case he has any ideas

I don't how to exactly check that, but I looked at the timestamps of when the plots are created. I have this:

-rw-rw-r-- 1 avaitkus avaitkus 233K Feb  9 22:09 initial_abs_eta.png
-rw-rw-r-- 1 avaitkus avaitkus 241K Feb  9 23:41 initial_mass.png
-rw-rw-r-- 1 avaitkus avaitkus 214K Feb  9 20:32 initial_ptlow.png
-rw-rw-r-- 1 avaitkus avaitkus 230K Feb  9 19:16 initial_pt.png

-rw-rw-r-- 1 avaitkus avaitkus 113K Feb  9 23:41 train_abs_eta.png
-rw-rw-r-- 1 avaitkus avaitkus 115K Feb  9 23:41 train_mass.png
-rw-rw-r-- 1 avaitkus avaitkus  98K Feb  9 23:41 train_ptlow.png
-rw-rw-r-- 1 avaitkus avaitkus 106K Feb  9 23:41 train_pt.png

It seems to be that the initial plots are the ones taking ages, given that the start time was 16:17.

AndriusVaitkus97 commented 9 months ago

UPD: Actually the initial plots also are really weird, why are there so many categories? image