normalize loss, reparametrize network

normalize loss function for classification by the number of not padded elements
normalized loss function for regression by the number of true target particles and with the stddev of the regression targets for more stable loss values
reparametrize attention network to have num_heads, head_dim
produced new version 1.7.1 of cms_pf_multi_particle_gun with more stats
remove unneeded pad_power_of_two option (seems like FlashAttention does that internally), not sure why I thought it was needed
make regression output type configurable
disable charge prediction for now (so far we didn't really study its performance)
enable setting only certain layers as trainable
set minimum lr to 1e-5 for cosinedecay
add new standalone notebook for quick studies
remove TF workflows from pipeline

jpata / particleflow