gordonwatts / CalRatioTrainer

1 stars 2 forks source link

Track pt doesn't look right in signal! #184

Closed gordonwatts closed 10 months ago

gordonwatts commented 10 months ago

This plot looks funny:

image

Looking at the actual data in there, for the test training sample from felix:

>>> max(f[f.label==1].nn_track_pt_0)
120.98439025878906
>>> max(f[f.label==1].nn_track_pt_1)
0.5955920815467834
>>> max(f[f.label==1].nn_track_pt_2)
nan
>>> max(f[f.label==1].nn_track_pt_3)

For the training sample that we build:

>>> max(f[f.label==1].track_pt_0)
4.165572647293354
>>> max(f[f.label==1].track_pt_1)
0.13566273102712084
>>> max(f[f.label==1].track_pt_2)
0.08003763706751886

So - clearly something is going wrong here!

gordonwatts commented 10 months ago

The first question is: how is this done by the C++ code?

    for(auto *track : *recoTracks) {
      if ( CalRatioUtils::DeltaR(jet->phi(),track->phi(),jet->eta(),track->eta()) < 0.2 ) {
        //addToVectorBranch(vars, "nn_track_pt", track->pt());
        if (track_counter < 20) track_pt_arr[track_counter] = track->pt();

Then in the code that does the NN prep:

        int indexArray[20] = {};
        double eta_sum = 0;
        for (int i=0; i < 20; i++){
                indexArray[i] = i;

                if (track_pt_arr[i] != 0 ) track_pt_arr[i] = (track_pt_arr[i])/(500000);

And then it is sorted from there.

gordonwatts commented 10 months ago

Next, the nn_track_pt comes from somewhere in the code - has the same prep been done to it?

    for(auto *track : *recoTracks) {
      if ( CalRatioUtils::DeltaR(jet->phi(),track->phi(),jet->eta(),track->eta()) < 0.2 ) {
        //addToVectorBranch(vars, "nn_track_pt", track->pt());

Ha - it is the same code as before - so it does not have the 1/500000 in it. Wait - what units are track_pt for us? MeV or GeV? Here the track->pt() is getting written out. How is it written out to the ROOT ntuple?

    if (isPVtrack || isJetTrack){
      addToVectorBranch(vars, "track_pT", track->pt()*0.001);
      addToVectorBranch(vars, "track_eta", track->eta());
      addToVectorBranch(vars, "track_phi", track->phi());
gordonwatts commented 10 months ago

So: units:

gordonwatts commented 10 months ago

In our code:

In short - this "works" - the NN input is scaled by 500,000, but is in MeV, so that is like dividing by 500 GeV.

The problem is nn_track_pt needs to be divided by 500,000 before it can be compared.

gordonwatts commented 10 months ago

Also, that means training with this file isn't possible!! You won't round-trip this file. Holy shit - so this is evidence (I guess??) that this was not what we used?

Wait - how is nn_track_pt used in the felix code? I don't see any evidence of it.

But that aside, according to the code that is in DiVertAnalysis, it will divide the MeV units by 500,000. So that is what we have to feed to the training file. These training files do not have that - there for, they do not match!

HOLY!

So, either something went very wrong, or this is evidence that this is not a good training file!

gordonwatts commented 10 months ago

This explains the spike at zero too in the new data.

The other odd thing is there are so many nan's in Felix's files - so few tracks are grabbed for some reason. That could be due to a phi problem, so I'll leave that for another bug report.

gordonwatts commented 10 months ago

Closing this as not going to fix: