Closed gordonwatts closed 10 months ago
The first question is: how is this done by the C++ code?
for(auto *track : *recoTracks) {
if ( CalRatioUtils::DeltaR(jet->phi(),track->phi(),jet->eta(),track->eta()) < 0.2 ) {
//addToVectorBranch(vars, "nn_track_pt", track->pt());
if (track_counter < 20) track_pt_arr[track_counter] = track->pt();
Then in the code that does the NN prep:
int indexArray[20] = {};
double eta_sum = 0;
for (int i=0; i < 20; i++){
indexArray[i] = i;
if (track_pt_arr[i] != 0 ) track_pt_arr[i] = (track_pt_arr[i])/(500000);
And then it is sorted from there.
Next, the nn_track_pt
comes from somewhere in the code - has the same prep been done to it?
for(auto *track : *recoTracks) {
if ( CalRatioUtils::DeltaR(jet->phi(),track->phi(),jet->eta(),track->eta()) < 0.2 ) {
//addToVectorBranch(vars, "nn_track_pt", track->pt());
Ha - it is the same code as before - so it does not have the 1/500000
in it. Wait - what units are track_pt
for us? MeV or GeV? Here the track->pt()
is getting written out. How is it written out to the ROOT ntuple?
if (isPVtrack || isJetTrack){
addToVectorBranch(vars, "track_pT", track->pt()*0.001);
addToVectorBranch(vars, "track_eta", track->eta());
addToVectorBranch(vars, "track_phi", track->phi());
So: units:
track_pT
is in GeVnn_track_pt
is in MeVIn our code:
DataFrame
that is written out by convert
has the units in GeV.In short - this "works" - the NN input is scaled by 500,000, but is in MeV, so that is like dividing by 500 GeV.
The problem is nn_track_pt
needs to be divided by 500,000 before it can be compared.
Also, that means training with this file isn't possible!! You won't round-trip this file. Holy shit - so this is evidence (I guess??) that this was not what we used?
Wait - how is nn_track_pt
used in the felix code? I don't see any evidence of it.
But that aside, according to the code that is in DiVertAnalysis
, it will divide the MeV units by 500,000. So that is what we have to feed to the training file. These training files do not have that - there for, they do not match!
HOLY!
So, either something went very wrong, or this is evidence that this is not a good training file!
This explains the spike at zero too in the new data.
The other odd thing is there are so many nan
's in Felix's files - so few tracks are grabbed for some reason. That could be due to a phi problem, so I'll leave that for another bug report.
Closing this as not going to fix:
nn_track_pt
's.DiVertAnalysis
code is expecting, which is rescale track_pt
by 500.0 when the units are GeV.
This plot looks funny:
Looking at the actual data in there, for the test training sample from felix:
For the training sample that we build:
So - clearly something is going wrong here!