graphnet-team / graphnet

A Deep learning library for neutrino telescopes
https://graphnet-team.github.io/graphnet/
Apache License 2.0
90 stars 92 forks source link

We have lost the ability to reconstruct vertex position #359

Closed RasmusOrsoe closed 1 year ago

RasmusOrsoe commented 1 year ago

Describe the bug

We cannot reconstruct vertex position. This is due to two issues.

Issue one _validate_and_set_transforms in Task only supports transforms that doesn't slice input. If one uses a transform


def f(x):
    x[:,3] = x[:,3]/100
   return x

the function fails because of dimensions of the mock data.

Assuming that the target doesn't require slicing is an issue that I believe was introduced since #97 ; it effectively removed our ability to reconstruct vertex position, as this requires scaling of the input data.

Issue two Because we chose not to pass the entire graph object to loss functions, we must pass a single "target" to the Task, which means that targets such as direction and interaction vertex must be stored in a single field, i.e graph['direction'] = torch.cat([dir_x, dir_y, dir_z]). This becomes an issue in Task, because https://github.com/graphnet-team/graphnet/blob/main/src/graphnet/models/task/task.py#L134:L136 turns the [1,3]-dimensional direction (or vertex) truth variable into a [batch_size, 1, 3]-dimensional vector, which results in slicing errors in transform functions like f(x) mentioned above.

To Reproduce Steps to reproduce the behavior:

  1. Start a fresh branch, and open a regression example in the examples folder
  2. define the following transform functions
def scale_XYZ(x):
    x[:,0] = x[:,0]/764.431509
    x[:,1] = x[:,1]/785.041607
    x[:,2] = x[:,2]/1083.249944
    return x

def unscale_XYZ(x):
    x[:,0] = 764.431509*x[:,0]
    x[:,1] = 785.041607*x[:,1]
    x[:,2] = 1083.249944*x[:,2]
    return x
  1. Change the Task to PassOutput3 with the following settings: transform_target = scale_XYZ, transform_inference = unscale_XYZ
  2. Go to line 606 in Dataset.py (https://github.com/graphnet-team/graphnet/blob/e619034ed36768e27426a11c0b41f28a97c5b1db/src/graphnet/data/dataset.py#L606) and add the following label
graph["vertex"] = torch.tensor(
            [
                truth_dict["position_x"],
                truth_dict["position_y"],
                truth_dict["position_z"],
            ],
            dtype=torch.float,
        ).reshape(1, -1)
  1. set target = 'vertex'
  2. Run example.

Expected behavior

  1. _validate_and_set_transforms should not assume that the truth variable is single row, single column (making slicing fail)
  2. Task should not change the dimensions of the truth variable from [batch_size, d] to [batch_size, 1, d], as this complicates the transform functions. (or at the very least we need to have a big, fat red sign somewhere, because people might make bad mistakes)

Full traceback Please include the full error message to allow for debugging

Traceback (most recent call last):
  File "/home/iwsatlas1/oersoe/phd/oscNext/run_jobs.py", line 259, in <module>
    main()
  File "/home/iwsatlas1/oersoe/phd/oscNext/run_jobs.py", line 255, in main
    train(config)
  File "/home/iwsatlas1/oersoe/phd/oscNext/run_jobs.py", line 143, in train
    task = PassOutput3(hidden_size=gnn.nb_outputs, target_labels=config['target'], loss_function=EuclideanDistanceLoss(), transform_target = scale_XYZ, transform_inference = unscale_XYZ)
  File "/home/iwsatlas1/oersoe/github/graphnet/src/graphnet/models/task/task.py", line 87, in __init__
    self._validate_and_set_transforms(
  File "/home/iwsatlas1/oersoe/github/graphnet/src/graphnet/models/task/task.py", line 182, in _validate_and_set_transforms
    t_test = torch.unsqueeze(transform_target(x_test), -1)
  File "/home/iwsatlas1/oersoe/phd/oscNext/run_jobs.py", line 38, in scale_XYZ
    x[:,0] = x[:,0]/764.431509
IndexError: too many indices for tensor of dimension 1

Additional context I think we should reconsider the decision to not pass entire graph object to loss functions.

I was very surprised to see the [batch_size, 1, d] dimensions of the direction and vertex variables. I was unsure if this had any impact on the direction reconstructions that I made for northeren tracks, so I've gone back and re-run those trials to see.

asogaard commented 1 year ago

I am not sure I follow this: If I train a model with the PositionReconstruction task, MSELoss, and target=["position_x", "positon_y", "position_z"] the training runs without errors and the predictions look sensible at a first glance. Is there a pressing need to use the custom scaling, PassOutput3 task, custom target label, etc.?