star-bnl / star-sw

Core software for STAR experiment
26 stars 63 forks source link

let StBTofSimResParams.h to use existed timestamp in bfc chain, instead of re-define a new timestamp #145

Closed Morning-Ye closed 2 years ago

Morning-Ye commented 2 years ago

In the last update, we let the loadParms() function to pick up the BTOF resolution table according to the runnumber. From the runnumber we get the year number and then set a safe timestamp = year+1001 to pick up the BTOF resolution table from DB. However, this step will replace the predefined timestamp in the bfc.C, thus it could affect the later Makers when visitting the DB due to the change of timestamp. Now, the new update will pick up the BTOF resolution table based on the common and existed timestamp. PS, the StBTofSimResParams.h was originally not built for running with the bfc.C, so that it was designed to pick up table by a given timestamp in this code itself. However, now it is included in data production with bfc.C, it need to be updated accordingly.

genevb commented 2 years ago

Thanks for the details, Zaochen. I would like to make one point even clearer, if we can...

You wrote, "In the last update". Was that PR #65 ? If that is correct, then I guess that the problem did not exist in earlier libraries and productions, yes?

However, we are producing data in SL21c right now, with this bug in place as far as I understand it: one with BFC (Run 19 AuAu19.6), and one is a MuDst=>PicoDst conversion (Run 10 AuAu200). Are these productions impacted?

Also, please understand that even if we merge this fix into main and decide that those two productions are not flawed, we cannot update SL21c while jobs from any productions are running as it will cause all the running jobs to crash. We will need to either wait to update SL21c (further delaying the Run 17 pp510 st_mtd production that I know is waiting on this fix), or we freeze a new library SL21d, in which case I should try to poll for other interests in codes that people want to have in SL21d as this fix is currently the only file change I know of that is of interest and a single file change seems like a small thing to create a whole new library for.

-Gene

Morning-Ye commented 2 years ago

Thanks for the details, Zaochen. I would like to make one point even clearer, if we can...

You wrote, "In the last update". Was that PR #65 ? If that is correct, then I guess that the problem did not exist in earlier libraries and productions, yes?

The last update is PR #65 . The StBTofSimResParams.h was never used in any productions in previous years. It start being used in data production only from the BES-II data production as we decided to save the nSigmaToF (need the BTOF resolution table which is loaded by calling loadParms() in StBTofSimResParams.h).

However, we are producing data in SL21c right now, with this bug in place as far as I understand it: one with BFC (Run 19 AuAu19.6), and one is a MuDst=>PicoDst conversion (Run 10 AuAu200). Are these productions impacted?

For the AuAu 19.6 GeV data production, it depend on whether there are some makers are called after BTOF makers and whether their calibration tables can be loaded correctly with the timestamp=year+1001. if not&not, then no impact; if yes&yes, then have impact. For the Run10 MuDst->PicoDst, if I understand correctly, all the information are just copied to PicoDst from MuDst, it should has no impact.

Also, please understand that even if we merge this fix into main and decide that those two productions are not flawed, we cannot update SL21c while jobs from any productions are running as it will cause all the running jobs to crash. We will need to either wait to update SL21c (further delaying the Run 17 pp510 st_mtd production that I know is waiting on this fix), or we freeze a new library SL21d, in which case I should try to poll for other interests in codes that people want to have in SL21d as this fix is currently the only file change I know of that is of interest and a single file change seems like a small thing to create a whole new library for.

-Gene

genevb commented 2 years ago

Hi, Zaochen

However, we are producing data in SL21c right now, with this bug in place as far as I understand it: one with BFC (Run 19 AuAu19.6), and one is a MuDst=>PicoDst conversion (Run 10 AuAu200). Are these productions impacted?

For the AuAu 19.6 GeV data production, it depend on whether there are some makers are called after BTOF makers and whether their calibration tables can be loaded correctly with the timestamp=year+1001. if not&not, then no impact; if yes&yes, then have impact.

I suggest you try running the current production chain with and without your patch and look for differences, though some may be less obvious.

root4star -b -q -l 'bfc.C(100,"DbV20210827 P2019a StiCA -beamline3D btof mtd mtdCalib ImpBToFt0Mode BEmcChkStat CorrY -OPr13 EbyET0 PicoVtxDefault PicoCovMtxWrite -evout -hitfilt","/star/rcf/test/daq/2019/069/st_physics_20069002_raw_1500008.daq")'

Thanks, -Gene

Morning-Ye commented 2 years ago

root4star -b -q -l 'bfc.C(100,"DbV20210827 P2019a StiCA -beamline3D btof mtd mtdCalib ImpBToFt0Mode BEmcChkStat CorrY -OPr13 EbyET0 PicoVtxDefault PicoCovMtxWrite -evout -hitfilt","/star/rcf/test/daq/2019/069/st_physics_20069002_raw_1500008.daq")'

I tried with and without the new updates, and found the printed track level information in the two according picoDst files are exactly same. For the event level information, there is one difference "Event.mTriggerIds = (vector)0xf2f1ab0" vs. "Event.mTriggerIds = (vector)0x106ffb28", but I don't know how could it affect the triggerIds. Do you have any comments?

genevb commented 2 years ago

Hi, Zaochen

"Event.mTriggerIds = (vector)0xf2f1ab0" vs. "Event.mTriggerIds = (vector)0x106ffb28",

That looks like a pointer to an array, with no information on whether the content of the array changed.

Good to hear that your investigation has not shown other differences. I'm not sure what else needs to be examined, but I guess any database calls that occur after the first event, or during the first event but after BTOF, are the ones susceptible to impact from the bug. EMC? MTD?

-Gene

Morning-Ye commented 2 years ago

Hi Gene,

Yes, for the first event, there is no difference. The difference in "mTriggerIds" shows up since the second event. Actually I did not see the impacts on the MTD or EMC information. As Ting mentioned, when they run their private code, if they call the BEMC makers after BTOF Makers, they will load the old BEMC table. If they call BEMC makers before BTOF makers, then BEMC could load new table. It depends on two things: 1. whether the makers are put after BTOF 2. what timestamp used for the table (compared to year-10-01). For the ongoing 19.6 GeV, BEMC information is not saved, so it should have no impact, right?

On Thu, Sep 9, 2021 at 9:41 AM Gene Van Buren @.***> wrote:

Hi, Zaochen

"Event.mTriggerIds = (vector)0xf2f1ab0" vs. "Event.mTriggerIds = (vector)0x106ffb28",

That looks like a pointer to an array, with no information on whether the content of the array changed.

Good to hear that your investigation has not shown other differences. I'm not sure what else needs to be examined, but I guess any database calls that occur after the first event, or during the first event but after BTOF, are the ones susceptible to impact from the bug. EMC? MTD?

-Gene

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/star-bnl/star-sw/pull/145#issuecomment-916163687, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSL4II4RGMSYAQ3KWTSFTTUBDBP5ANCNFSM5DVXMESQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

genevb commented 2 years ago

My understanding of the bug is that for data taken on, for example, 2019-03-10, the date used for database queries after StBTofCalibMaker's first event would be 2018-10-01. That would exclude using any database entries with beginTime between those two dates that should otherwise get used.

Using the example job I provided, here are the list of makers that run after StBTofCalibMaker:

... QA :INFO - StBTofCalibMaker::btofCalib QA :INFO - StETofHitMaker::etofHit QA :INFO - StETofMatchMaker::etofMatch QA :INFO - StMtdMatchMaker::MtdMatch QA :INFO - StMtdCalibMaker::mtdCalib QA :INFO - StEventCompendiumMaker::StEventCompendiumMaker QA :INFO - StHeavyTagMaker::HeavyTag QA :INFO - StHighPtTagsMaker::HighPtTags QA :INFO - StTagsMaker::tags QA :INFO - StStrangeMuDstMaker::strangeMuDst QA :INFO - StMuDstMaker::MuDst QA :INFO - StPicoDstMaker::PicoDst QA :INFO - StEventQAMaker::EventQA QA :INFO - StTreeMaker::outputStream QA :INFO - StAnalysisMaker::analysis

I would then immediately be concerned about ETOF and MTD.

Looking at the database I can see that there are MTD entries for 2018-12-10 and 2018-12-20 to initialize for Run 19. It is possible that the content of the 2018-12-20 entries in several tables are identical to the entries made for 2017-12-20, which would cause identical results for normal reconstruction and embedding, but it would cause non-ideal calibrations to be used for pure simulations (so 2017-12-20 would be picked up instead of 2018-12-10). If you can confirm that the MTD database entries are identical for the two years, then the MTD should be fine in this production.

The situation looks a bit worse for ETOF. For example, the etofDetResolution and etofSimEfficiency tables have 2018-12-10 for their entry with the earliest timestamp. Perhaps these tables aren't used in reconstruction but are used in embedding. So embedding would fail. Other ETOF tables have entries that may be the same for 2018-12-10 and 2018-01-01. However, I can clearly see that the etofPulserToPeak, etofSignalVelocity, and etofTimingWindow tables have different content for those two timestamped entries. Did they affect the ETOF results in production?

What other makers might make database calls after the first event is an open question. It occurred to me that I may have made a mistake to use 100 events in the suggested job as the event timestamp may not change over those 100 events and that might allow the database not to be queried so that cached tables are used. I found it to be true, that the first 153 events in that file all have the same event timestamp. So it would be wise to use a number of events larger than that and look for differences in those later events.

-Gene

genevb commented 2 years ago

Looking more carefully at the code change for this PR, I see that a separate instance of St_db_Maker was being instantiated within loadParams(). Given that an instance of St_db_Maker was constructed earlier in the chain, it isn't obvious to me that anyone's database queries used this second instance. If not, then indeed, the productions would be unimpacted.

-Gene

Morning-Ye commented 2 years ago

Could reviewers comment on the changes?

genevb commented 2 years ago

I saw Zaochen sent a note to the starsoft-l mailing list about this, so I will summarize here what I still feel to be open questions about impacts on the Run 19 AuAu19.6 production:

Was ETOF impacted as it came later in the chain? (Zaochen said he looked at MTD, but MTD may have bene un-impacted because DB table contents were identical.)

Were events later than event 154 in the provided example file impacted as they are the first events to have a different event timestamp?

-Gene

genevb commented 2 years ago

Also, I will note that these open questions do not constitute a reason to wait for merging into main, but rather are whether there were issues with the production that we conducted with SL21c before this fix. If there was no impact to that production, then I expect we will want to tag SL21c_3 with this patch included to ensure no other productions could get impacted, and then rebuild the SL21c library yet again. Now is actually a good time to do that because we are at a moment between productions using that library.

genevb commented 2 years ago

Zaochen said he checked ETOF for impacts and found none, closing my first open question from yesterday.

genevb commented 2 years ago

My understanding from yesterday's discussion at the S&C meeting was that we should go ahead and update SL21c with this code patch while the effort to understand whether the Run 19 AuAu19.6 production was impacted in any way. @plexoos , can you please provide an SL21c_3 tag with this included?

plexoos commented 2 years ago

You can go ahead and merge it if you think it is ready and a review from @dmarkh is not needed. I will add it to the SL21 branch then

genevb commented 2 years ago

Ah.....I'm certainly interested in getting the library updated as soon as we can (the farm is sitting idle until this update happens), but I won't interfere with the review process in this case.

dmarkh commented 2 years ago

Okay, I've checked this PR, and it looks like timestamp tinkering is removed from the mBTofRes->loadParams. LGTM.

genevb commented 2 years ago

Not sure why it is up to me to squash and merge, but will do so that this will move forward.