Track pt doesn't look right in signal!

gordonwatts commented 10 months ago

This plot looks funny:

Looking at the actual data in there, for the test training sample from felix:

>>> max(f[f.label==1].nn_track_pt_0)
120.98439025878906
>>> max(f[f.label==1].nn_track_pt_1)
0.5955920815467834
>>> max(f[f.label==1].nn_track_pt_2)
nan
>>> max(f[f.label==1].nn_track_pt_3)

For the training sample that we build:

>>> max(f[f.label==1].track_pt_0)
4.165572647293354
>>> max(f[f.label==1].track_pt_1)
0.13566273102712084
>>> max(f[f.label==1].track_pt_2)
0.08003763706751886

So - clearly something is going wrong here!

gordonwatts commented 10 months ago

The first question is: how is this done by the C++ code?

    for(auto *track : *recoTracks) {
      if ( CalRatioUtils::DeltaR(jet->phi(),track->phi(),jet->eta(),track->eta()) < 0.2 ) {
        //addToVectorBranch(vars, "nn_track_pt", track->pt());
        if (track_counter < 20) track_pt_arr[track_counter] = track->pt();

Then in the code that does the NN prep:

        int indexArray[20] = {};
        double eta_sum = 0;
        for (int i=0; i < 20; i++){
                indexArray[i] = i;

                if (track_pt_arr[i] != 0 ) track_pt_arr[i] = (track_pt_arr[i])/(500000);

And then it is sorted from there.

gordonwatts commented 10 months ago

Next, the nn_track_pt comes from somewhere in the code - has the same prep been done to it?

    for(auto *track : *recoTracks) {
      if ( CalRatioUtils::DeltaR(jet->phi(),track->phi(),jet->eta(),track->eta()) < 0.2 ) {
        //addToVectorBranch(vars, "nn_track_pt", track->pt());

Ha - it is the same code as before - so it does not have the 1/500000 in it. Wait - what units are track_pt for us? MeV or GeV? Here the track->pt() is getting written out. How is it written out to the ROOT ntuple?

    if (isPVtrack || isJetTrack){
      addToVectorBranch(vars, "track_pT", track->pt()*0.001);
      addToVectorBranch(vars, "track_eta", track->eta());
      addToVectorBranch(vars, "track_phi", track->phi());

gordonwatts commented 10 months ago

So: units:

track_pT is in GeV
nn_track_pt is in MeV

gordonwatts commented 10 months ago

In our code:

The DataFrame that is written out by convert has the units in GeV.
The build rescales by 500.0

In short - this "works" - the NN input is scaled by 500,000, but is in MeV, so that is like dividing by 500 GeV.

The problem is nn_track_pt needs to be divided by 500,000 before it can be compared.

gordonwatts commented 10 months ago

Also, that means training with this file isn't possible!! You won't round-trip this file. Holy shit - so this is evidence (I guess??) that this was not what we used?

Wait - how is nn_track_pt used in the felix code? I don't see any evidence of it.

But that aside, according to the code that is in DiVertAnalysis, it will divide the MeV units by 500,000. So that is what we have to feed to the training file. These training files do not have that - there for, they do not match!

HOLY!

So, either something went very wrong, or this is evidence that this is not a good training file!

gordonwatts commented 10 months ago

This explains the spike at zero too in the new data.

The other odd thing is there are so many nan's in Felix's files - so few tracks are grabbed for some reason. That could be due to a phi problem, so I'll leave that for another bug report.

gordonwatts commented 10 months ago

Closing this as not going to fix:

Bug appears to be in Felix's training file, which has much too large nn_track_pt's.
Our file seems to be doing what the DiVertAnalysis code is expecting, which is rescale track_pt by 500.0 when the units are GeV.

gordonwatts / CalRatioTrainer

Track pt doesn't look right in signal! #184