ivmfnal / data_dispatcher

BSD 3-Clause "New" or "Revised" License
1 stars 3 forks source link

batch jobs don't seem to tell dd the right job site Fermigrid/FNAL_DCACHE preference is 100. #18

Closed hschellman closed 1 year ago

hschellman commented 1 year ago

Ok, so when I run interactively the preferences look good but when I run on batch I get:

looking for files near US_FermiGrid next_file {'project_id': 680, 'namespace': 'pdsp_det_reco', 'name': 'np04_raw_run005284_0001_dl3_reco1_14390693_0_20201121T072752 Z.root', 'state': 'reserved', 'worker_id': '964d1aac', 'attempts': 2, 'attributes': {}, 'reserved_since': 1675107397.764603, 'repl icas': {'LANCASTER': {'name': 'np04_raw_run005284_0001_dl3_reco1_14390693_0_20201121T072752Z.root', 'namespace': 'pdsp_det_reco', 'path': '/dpm/lancs.ac.uk/home/dune/pdsp_det_reco/ce/1d/np04_raw_run005284_0001_dl3_reco1_14390693_0_20201121T072752Z.root', 'url' : 'root://fal-pygrid-30.lancs.ac.uk//dpm/lancs.ac.uk/home/dune/pdsp_det_reco/ce/1d/np04_raw_run005284_0001_dl3_reco1_14390693_0_20 201121T072752Z.root', 'rse': 'LANCASTER', 'preference': 100, 'available': True, 'rse_available': True}, 'FNAL_DCACHE': {'name': 'n p04_raw_run005284_0001_dl3_reco1_14390693_0_20201121T072752Z.root', 'namespace': 'pdsp_det_reco', 'path': '/pnfs/fnal.gov/usr/dune /tape_backed/dunepro/protodune-sp/full-reconstructed/2020/detector/physics/PDSPProd4/00/00/52/84/np04_raw_run005284_0001_dl3_reco1 _14390693_0_20201121T072752Z.root', 'url': 'root://fndca1.fnal.gov:1094//pnfs/fnal.gov/usr/dune//tape_backed/dunepro/protodune-sp/ full-reconstructed/2020/detector/physics/PDSPProd4/00/00/52/84/np04_raw_run005284_0001_dl3_reco1_14390693_0_20201121T072752Z.root' , 'rse': 'FNAL_DCACHE', 'preference': 100, 'available': True, 'rse_available': True}}, 'project_attributes': {}} 2023-01-30 19:36: 37.797460

note both preferences are 100. This indicates that dd is not recognizing fermigrid node as a close-by location. How should I test/debug this? What does dd use to find where the job site is?

ivmfnal commented 1 year ago

Are you setting CPU Site id correctly when you issue "dd worker next" ?

ivmfnal commented 1 year ago

DD uses proximity CPU/RSE proximity map: https://metacat.fnal.gov:9443/dune/dd/gui/R/proximity_map

According to the map, default proximity is 100, which means if you enter a non-existing CPU site, you will get the proximity 100 for all RSEs

hschellman commented 1 year ago

Problem is that FNAL has 2 sites US_FermiGrid for grid and US_FNAL for interactive it seems. Best to list both in the map.

On Jan 30, 2023, at 12:06 PM, Igor Mandrichenko @.***> wrote:

[This email originated from outside of OSU. Use caution with links and attachments.]

DD uses proximity CPU/RSE proximity map: https://metacat.fnal.gov:9443/dune/dd/gui/R/proximity_map https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmetacat.fnal.gov%3A9443%2Fdune%2Fdd%2Fgui%2FR%2Fproximity_map&data=05%7C01%7Cheidi.schellman%40oregonstate.edu%7Ca3f9458e1cf04555c99a08db02fd7f4a%7Cce6d05e13c5e4d6287a84c4a2713c113%7C0%7C0%7C638107060008284629%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ZrCYS1qkqQ806mxAmKp4c%2F%2FBJLDeaMXEKBm2JyD%2FMQI%3D&reserved=0 According to the map, default proximity is 100, which means if you enter a non-existing CPU site, you will get the proximity 100 for all RSEs

— Reply to this email directly, view it on GitHub https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fivmfnal%2Fdata_dispatcher%2Fissues%2F18%23issuecomment-1409259586&data=05%7C01%7Cheidi.schellman%40oregonstate.edu%7Ca3f9458e1cf04555c99a08db02fd7f4a%7Cce6d05e13c5e4d6287a84c4a2713c113%7C0%7C0%7C638107060008284629%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ejJlN%2FmEJHpl2EpEl9ZRICiDE7E6eObRZ8rq5v8hVTw%3D&reserved=0, or unsubscribe https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAIA37DK3J43S5KHOAB7XCXLWVANM5ANCNFSM6AAAAAAULSCMVQ&data=05%7C01%7Cheidi.schellman%40oregonstate.edu%7Ca3f9458e1cf04555c99a08db02fd7f4a%7Cce6d05e13c5e4d6287a84c4a2713c113%7C0%7C0%7C638107060008284629%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Ty3CyahOX6IHKSJSOz1x5pefZ%2BAb5jsh5TfVAc3t2wY%3D&reserved=0. You are receiving this because you authored the thread.

ivmfnal commented 1 year ago

The map is maintained by Andrew. DD downloads it automatically from github