ivmfnal / data_dispatcher

BSD 3-Clause "New" or "Revised" License
1 stars 3 forks source link

2 similar datasets that either run or fail on not found #7

Closed hschellman closed 1 year ago

hschellman commented 1 year ago

schellma:run5141Prod2Reco shows up as not-found but creates a project and just sits there in the next_file step trying until it hits the retry limit.

schellma:run5141recentReco runs just fine

both have files in metacat and Steve checked that at least one file in the Prod2 one is in rucio.

Would be good to have a better diagnostic/failure mode - like if you get the not-found for files, terminate instead of retry.

ivmfnal commented 1 year ago

What does it mean: "schellma:run5141Prod2Reco shows up as not-found" ?

ivmfnal commented 1 year ago

If you mean project 407, where are the replicas for that dataset ?

ivmfnal commented 1 year ago

Here are RSEs known to DD: https://metacat.fnal.gov:9443/dune/dd/gui/R/index Do we need to add more RSEs ?

hschellman commented 1 year ago

Enabling STFC might be good but those files should also be on cache at FNAL so is there a failure if there is a location that is not on the good RSE list?

Here is where sam thinks the file are:

samweb get-file-access-url np04_raw_run005141_0009_dl13_reco_24031905_0_20190915T212309.root --schema=root root://xrootd.echo.stfc.ac.uk/dune:/protodune/RSE/np04_pdspprod2_reco/0b/fd/np04_raw_run005141_0009_dl13_reco_24031905_0_20190915T212309.root root://fndca1.fnal.gov:1094/pnfs/fnal.gov/usr/dune/tape_backed/dunepro/protodune-sp/full-reconstructed/2019/detector/physics/PDSPProd2/00/00/51/41/np04_raw_run005141_0009_dl13_reco_24031905_0_20190915T212309.root

I can’t get rucio to work to check.

But before we turn on STFC, this looks like a good check for a failure mode, where a bad RSE seems to confuse the system.

On Dec 31, 2022, at 2:10 PM, Igor Mandrichenko @.**@.>> wrote:

[This email originated from outside of OSU. Use caution with links and attachments.]

Here are RSEs known to DD: https://metacat.fnal.gov:9443/dune/dd/gui/R/indexhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmetacat.fnal.gov%3A9443%2Fdune%2Fdd%2Fgui%2FR%2Findex&data=05%7C01%7Cheidi.schellman%40oregonstate.edu%7C60ef944a8b47469fcefe08daeb7be601%7Cce6d05e13c5e4d6287a84c4a2713c113%7C0%7C0%7C638081214626171580%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=wFy2PBcLhJbG6CWbU7Oa9kakAeqVfiyn3P7oHOhksDc%3D&reserved=0 Do we need to add more RSEs ?

— Reply to this email directly, view it on GitHubhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fivmfnal%2Fdata_dispatcher%2Fissues%2F7%23issuecomment-1368286257&data=05%7C01%7Cheidi.schellman%40oregonstate.edu%7C60ef944a8b47469fcefe08daeb7be601%7Cce6d05e13c5e4d6287a84c4a2713c113%7C0%7C0%7C638081214626171580%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=dfpLZ64EDdTGfFyzw3sFR3W8KD%2FJcpb3ebPCiFr6Utk%3D&reserved=0, or unsubscribehttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAIA37DJCN42FEV723GKRWM3WQCVPFANCNFSM6AAAAAATNXQWHM&data=05%7C01%7Cheidi.schellman%40oregonstate.edu%7C60ef944a8b47469fcefe08daeb7be601%7Cce6d05e13c5e4d6287a84c4a2713c113%7C0%7C0%7C638081214626171580%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Xfj%2FHP8rD2idGKrG2lLtCknghkzOxlOi6JC7bYLZZEU%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.***>

ivmfnal commented 1 year ago

We can add all or some RSEs from Rucio to DD. It's an admin function. Someone just need to say so.