ivmfnal / metacat

Metadata Catalog
BSD 3-Clause "New" or "Revised" License
4 stars 5 forks source link

SAM fields missing from the conversion #5

Closed ivmfnal closed 1 year ago

ivmfnal commented 1 year ago

SAM file np04_raw_run010425_0090_dl12_reco1_28404162_0_20220121T051020Z_reco2_52848725_0_20220217T110748Z.root:

File Name: np04_raw_run010425_0090_dl12_reco1_28404162_0_20220121T051020Z_reco2_52848725_0_20220217T110748Z.root
             File Id: 65034554
           Create Date: 2022-02-17T11:10:18+00:00
              User: dunepro
           Update Date: 2022-02-17T14:05:27+00:00
           Update User: dunepro
            File Size: 1176065089
            Checksum: enstore:1353901514
                 adler32:c0a6e5cb
                 md5:82c9636a1a5089ee1ce632e17e50e64a
         Content Status: good
            File Type: detector
           File Format: artroot
            Data Tier: reco-recalibrated
           Application: art reco v09_30_00
           Process Id: 18755920
           Event Count: 31
           First Event: 39289
           Last Event: 39726
           Start Time: 2022-02-17T11:08:24+00:00
            End Time: 2022-02-17T11:09:20+00:00
           Data Stream: cosmics
       art.file_format_era: ART_2011a
     art.file_format_version: 13.0
         art.first_event: 39289.0
         art.last_event: 39726.0
        art.process_name: Reco2
          art.run_type: protodune-sp
        detector.hv_value: 180
          DUNE.campaign: RITM1312299
       DUNE_data.acCouple: 0
    DUNE_data.calibpulsemode: 0
     DUNE_data.DAQConfigName: np04_CRT_noprescale_tune_00008
    DUNE_data.detector_config: cob2_rce03:cob2_rce04:cob2_rce05:cob2_rce06:cob2_rce07:cob2_rce08:cob3_rce01:cob3_rce02:cob3_rce03:cob3_rce04:cob3_rce05:cob3_rce06:cob3_rce07:cob3_rce08:cob4_rce01:cob4_rce02:cob4_rce03:cob4_rce04:cob4_rce05:cob4_rce06:ssp101:ssp102:ssp103:ssp104:ssp201:ssp202:ssp203:ssp204:ssp301:ssp302:ssp303:ssp304:ssp401:ssp402:ssp403:ssp404:ssp501:ssp502:ssp503:ssp504:ssp601:ssp602:ssp603:ssp604:trigger_0:wib101:wib102:wib103:wib104:wib105:wib201:wib202:wib203:wib204:wib205:wib301:wib302:wib303:wib304:wib305:wib401:wib402:wib403:wib404:wib405:wib501:wib502:wib503:wib504:wib505:wib601:wib602:wib603:wib604:wib605
    DUNE_data.febaselineHigh: 1
        DUNE_data.fegain: 2
       DUNE_data.feleak10x: 0
      DUNE_data.feleakHigh: 1
     DUNE_data.feshapingtime: 2
DUNE_data.inconsistent_hw_config: 0
     DUNE_data.is_fake_data: 0
              Runs: 10425.0001 (protodune-sp)
             Parents: np04_raw_run010425_0090_dl12_reco1_28404162_0_20220121T051020Z.root

Converted to MetaCat: https://metacat.fnal.gov:9443/dune_meta_demo/app/gui/show_file?show_form=yes&namespace=&name=&did=default%3Anp04_raw_run010425_0090_dl12_reco1_28404162_0_20220121T051020Z_reco2_52848725_0_20220217T110748Z.root&fid=

What is missing:

ivmfnal commented 1 year ago

What metadata category.name should be used for process id ? Possibilities:

hschellman commented 1 year ago

Hi, if this was originally a sam dimension, not a parameter, I suggest it goes in core. I'm leaning towards

dimensions all got to core. parameters (with dots) stay the same.

ivmfnal commented 1 year ago

Dimensions do not go to core. They go to the same category as they are in dimensions.

In SAM, process_id is not in dimensions. It is a column in data_files table. So it is logical to put it in core.

We have a chance to rename fields to better reflect their meaning and/or to organize them into categories. That is why I suggested some options for process_id:

ivmfnal commented 1 year ago

I added process_id as core.process_id for now. We can change that if needed

hschellman commented 1 year ago

Thanks, interestingly for dd it is a hash not a number as far as I can tell so format may be an issue. I’m putting worker_ID there or maybe I should put the batch jobID?

On Jan 18, 2023, at 10:48 AM, Igor Mandrichenko @.**@.>> wrote:

[This email originated from outside of OSU. Use caution with links and attachments.]

I added process_id as core.process_id for now. We can change that if needed

— Reply to this email directly, view it on GitHubhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fivmfnal%2Fmetacat%2Fissues%2F5%23issuecomment-1387572023&data=05%7C01%7Cheidi.schellman%40oregonstate.edu%7C5c8989145d1a476488fb08daf9848bd0%7Cce6d05e13c5e4d6287a84c4a2713c113%7C0%7C0%7C638096644922172936%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=b%2FKHvJOwutFMmJb7IQDt2EirKDpSevdNktKnFJNoTLI%3D&reserved=0, or unsubscribehttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAIA37DKCN3UH7ANHMPCEPOTWTA3GVANCNFSM6AAAAAAQWEJOHQ&data=05%7C01%7Cheidi.schellman%40oregonstate.edu%7C5c8989145d1a476488fb08daf9848bd0%7Cce6d05e13c5e4d6287a84c4a2713c113%7C0%7C0%7C638096644922172936%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=RPZZA3nityzt%2FXLIyPvw5aM8Wf6xijILnzdMISkfR5s%3D&reserved=0. You are receiving this because you commented.Message ID: @.***>

ivmfnal commented 1 year ago

DD worker ID is a string. It can be auto-generated by DD client (and currently it is a shortened UUID), or it can be assigned by the user.

Currently, process_id is not imported from SAM to MetaCat, but when it is, it can be converted to either a string or it can be stored as integer. It is up to DUNE to decide how they want this SAM attribute to be represented in MetaCat.

hschellman commented 1 year ago

Is there a reason for the dd worker id to be a string? There was something nice about seeing the integers increase in sam as they were assigned. Not a big deal but worth a thought.

On Jan 18, 2023, at 12:21 PM, Igor Mandrichenko @.**@.>> wrote:

[This email originated from outside of OSU. Use caution with links and attachments.]

DD worker ID is a string. It can be auto-generated by DD client (and currently it is a shortened UUID), or it can be assigned by the user.

Currently, process_id is not imported from SAM to MetaCat, but when it is, it can be converted to either a string or it can be stored as integer. It is up to DUNE to decide how they want this SAM attribute to be represented in MetaCat.

— Reply to this email directly, view it on GitHubhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fivmfnal%2Fmetacat%2Fissues%2F5%23issuecomment-1387728890&data=05%7C01%7Cheidi.schellman%40oregonstate.edu%7C3a73d3fe822d4ee3105e08daf9918c8d%7Cce6d05e13c5e4d6287a84c4a2713c113%7C0%7C0%7C638096700768871682%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=TxsuNs3hMlb69d1uMplFZaSohIAUisFN510CGhcmnyQ%3D&reserved=0, or unsubscribehttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAIA37DLSYMLVVC4R2WIEOETWTBGDVANCNFSM6AAAAAAQWEJOHQ&data=05%7C01%7Cheidi.schellman%40oregonstate.edu%7C3a73d3fe822d4ee3105e08daf9918c8d%7Cce6d05e13c5e4d6287a84c4a2713c113%7C0%7C0%7C638096700768871682%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=PFuKrEz48dPMrzEYuCl7lv3qJNHy2eNFOfl1jD4I9zc%3D&reserved=0. You are receiving this because you commented.Message ID: @.***>

ivmfnal commented 1 year ago

We can add generation of numeric, monotonically increasing worker id as a new DD function. This would be in addition to the existing UUID-based generation. The worker_id will still be string, but this way it can be a string which looks like an integer.

Is this what you want ?

ivmfnal commented 1 year ago

Here is a caveat though. If we use multiple instances of DD, then we may need to come up with a way for the DD instances to generate non-overlapping sets of numeric worker ids.

hschellman commented 1 year ago

That’s a good point. But then we need to record the instance we were talking to somehow.

On Jan 18, 2023, at 1:00 PM, Igor Mandrichenko @.**@.>> wrote:

[This email originated from outside of OSU. Use caution with links and attachments.]

Here is a caveat though. If we use multiple instances of DD, then we may need to come up with a way for the DD instances to generate non-overlapping sets of numeric worker ids.

— Reply to this email directly, view it on GitHubhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fivmfnal%2Fmetacat%2Fissues%2F5%23issuecomment-1396081496&data=05%7C01%7Cheidi.schellman%40oregonstate.edu%7Cfefbf34a1822459accfb08daf99719c9%7Cce6d05e13c5e4d6287a84c4a2713c113%7C0%7C0%7C638096724615782767%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=78AzGsSrevZT%2F%2B2yy7EM83jIVuczpN18faCFrbwdsAE%3D&reserved=0, or unsubscribehttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAIA37DNYA64XWFID67PNBKTWTBKYVANCNFSM6AAAAAAQWEJOHQ&data=05%7C01%7Cheidi.schellman%40oregonstate.edu%7Cfefbf34a1822459accfb08daf99719c9%7Cce6d05e13c5e4d6287a84c4a2713c113%7C0%7C0%7C638096724615782767%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=akH6nXuZUsYbyO4Wr7QZ0tSqqnUBebUsLlRl4Wr8Z%2FU%3D&reserved=0. You are receiving this because you commented.Message ID: @.***>

hschellman commented 1 year ago

Lemme think about it. Seems a bit bad to have 2 different things.

On Jan 18, 2023, at 12:57 PM, Igor Mandrichenko @.**@.>> wrote:

[This email originated from outside of OSU. Use caution with links and attachments.]

We can add generation of numeric, monotonically increasing worker id as a new DD function. This would be in addition to the existing UUID-based generation. The worker_id will still be string, but this way it can be a string which looks like an integer.

Is this what you want ?

— Reply to this email directly, view it on GitHubhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fivmfnal%2Fmetacat%2Fissues%2F5%23issuecomment-1396077169&data=05%7C01%7Cheidi.schellman%40oregonstate.edu%7Ce86195be19eb459b14ee08daf9969916%7Cce6d05e13c5e4d6287a84c4a2713c113%7C0%7C0%7C638096722454802804%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=tR8RjzVPvOmnpH%2FrLMwYvrX7ig78J6Dw3kbE7Y7n1Yc%3D&reserved=0, or unsubscribehttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAIA37DKTI2Y42YO2X5WFEDTWTBKLFANCNFSM6AAAAAAQWEJOHQ&data=05%7C01%7Cheidi.schellman%40oregonstate.edu%7Ce86195be19eb459b14ee08daf9969916%7Cce6d05e13c5e4d6287a84c4a2713c113%7C0%7C0%7C638096722454802804%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=s0fLg6qLN5QP9XNrEfF5OixGJyn5wj1ldpDgGn7NbjI%3D&reserved=0. You are receiving this because you commented.Message ID: @.***>