Report storage usage in admin-get-clusters command

YevheniiSemendiak commented 3 years ago

Is your feature request related to a problem? Please describe.

Currently, there is no way to check how much storage: is used in a specific cluster. The only workaround is to go to your cluster provided's (AWS/GCP/whatever) console and check the underlying backend NFS usage. I don't even know what we could do in on-premise installations.

Describe the solution you'd like

It would be nice to see the storage utilization in neuro admin get-clusters command output.

Describe alternatives you've considered

Querying each cluster (provider?) might take some time (especially if the user has access to multiple clusters). Therefore, as an alternative, we could introduce a different command like neuro admin get-cluster-details <cluster> and see their overall info about cluster such as:

storage utilization
registry utilization
services versions
number of users
avg job number
etc.

Additional context

The request came from the Synthesis team.

romasku commented 3 years ago

Thanks for the issue, it is a good direction for improvement!

I think we should first add some separate commands because not all reports can be collected easily. Also it will allow us to display more detailed info.

Storage utilization

My idea here is to add neuro storage stats (or neuro storage du) that will print total storage usage plus per-user stats:

User      Usage
romasku   2G
admin      50Mb

Total: 2G Used, 3G Free, 5G Total

This can be easily (and efficiently) implemented on the server-side.

Registry utilization

Official registry API has no support for retrieving total disk utilization, so we will have to implement it for each cloud provider separately. This will require a lot of effort and probably will not work for onprem. As second options, we can scan all images periodically (on server) and generate some cached stats view. This will work for most instalations and will allow us to usage by user but will have info with large delay (~1 hour, maybe even more). I personally prefer second option as it more robust and can generate more user-friedly data, but we need to think about it.

Service versions

I see this as some command like neuro admin get-cluster-service-versions <cluster-name> that will print table with versions (similar to how slack bot prints it) and another command to show info about all clusters user is admin of (something like neuro admin get-service-versions)

Summary for cluster command

After we will implement all of above commands, it will be easy to add some additional command to show summary, through I'm not sure that we will really need it.

YevheniiSemendiak commented 3 years ago

Storage

As for me, the command neuro storage du should take the storage path as an argument and compute disk usage for subpath (even storage:// - storage root for current cluster). But what's about RBAC here, is it OK? neuro storage stats seems to be more general, just to view the utilization of all user's storage, where you have access, right? What I'm trying to say is that the du cmd seems to be much more flexible and useful, but might be tricky to implement, while stats is much easier and could be a good place to start with (and move towards du later if needed).

Registry

I also do like 2nd option more 👍 In any case, we could later just expose a call to that "calculator" to refresh the cached stats so it will simply give us needed result.

Services versions

Cool idea!

Summary for cluster command

Indeed, right now we don't need a summary command for all mentioned aspects, but if we have each of them separately - implementing some sort of alias to call all of them in a row is straightforward.

Minor remart: I would like if we could group all those stats commands under the same grop. Something like neuro admin get-stats <aspect>, where aspect is one of storage, registry, service-versions whatever.