Open colsond opened 7 years ago
After looking at these json dumps I noticed that they have some listed proxies for a given squid. @rptaylor do you know what these are? ex:
"squid1.ppgrid1.rhul.ac.uk": { ... ... "proxies": [ { "default": [ "squid1.ppgrid1.rhul.ac.uk:3128", "lt2cache00.grid.hep.ph.ic.ac.uk:3128", "cms-squid.gridpp.rl.ac.uk:3128", "lcgsquid01.gridpp.rl.ac.uk:3128", "lcgsquid02.gridpp.rl.ac.uk:3128" ] } ... ... },
Does this mean this squid has a second layer of proxies pointing to it?
It appears that worker-proxies.json maps a site to a list of squids, some of which may be at other sites. It is odd that the key for a site is itself a squid. We'll have to put some thought and discussion into how we approach this.
Yeah I'm not really too sure how much useful information can be peeled out of the worker-proxies dump. The grid squids on the other hand would be fairly trivial to add, but they would need a seperate designation so shoal doesn't drop them because they aren't sending heartbeat msgs via amqp.
We'd also have to discuss a default access level for these since the json dump has very little of the configuration data the server is accustomed to. We could even make an access level specific to this case.
Thoughts?
After some more thought I've made a bit of a list of issues that would need to be resolved to get this rolling:
The json branch contains the required changes to import these squids to shoal. On my development server there are a combined 337 squids including the ones from the production server. While I think importing these squids with no agents has value, I think we need to have a discussion regarding the goal of these "agentless" squids and what type of services we want shoal to be able to provide.
A previous CHEP talk discusses where these 2 json files come from and how the squids are brought into the list. Slides 5/6 contain a summary. https://indico.cern.ch/event/505613/contributions/2230709/attachments/1346624/2030637/Oral-180.pdf
To be honest it sounds like their wpad feature is almost identical to shoal except that they also factor in the project (CMS/ATLAS/cvfms/frontier... etc) into their serving logic. The worker-proxies.json file specifically contains the information for each proxy under "Source."
Ex.
"lcg-admin4.scinet.utoronto.ca": {
"ip": "142.150.19.7",
"proxies": [
{
"default": [
"lcg-admin4.scinet.utoronto.ca:3128",
"lcgvo.westgrid.ca:3128",
"kraken01.westgrid.ca:3128"
]
}
],
"names": [
"CA-SCINET-T2"
],
"source": "ATLAS"
},
I think we may want to consider integrating this sort of feature into shoal, we could give squids a classification such as "all, CVFMS, Atlas, etc.." or any combination thereof. This way we could make a set of new REST interfaces such as /nearestATLAS or /nearestCVFMS that are specific to a given project. The CHEP slides also talk about how some of these proxies lack the resources to be a suitable proxy for applications like Frontier so this may provide a dynamic configuration solution without the threat of overloading a proxy.
At Ryan's suggestion:
We should look into a feature for Shoal allowing it to import squids via JSON files e.g. http://wlcg-squid-monitor.cern.ch/worker-proxies.json http://wlcg-squid-monitor.cern.ch/grid-squids.json
Shoal should combine the squids from these JSON files (representing information from CMS sitedb/siteconf, AGIS, GOCDB and OIM) with the ones advertised by Shoal agents.