Closed Ofir-Shechtman closed 2 years ago
Hello!
We'll need access to the Prometheus server
What sort of access are you looking for?
Hi, Our task is to implement a predictive component for Prometheus metrics. This component will incorporate a variety of well-known time series forecasting (TSF) algorithms, based on statistical methods, deep neural networks, or their combination. It will receive as input a stream of Prometheus updates (i.e., files containing the last recorded values for the monitored metrics) and generate a stream of predicted future values for all involved metrics.
In order to complete our project, we will probably need access to all statistical data and metrics of all the applications monitored by Prometheus in operate-first.
Thanks for helping us
+1 from me. @4n4nd @Shreyanand do we have notebooks that access the Prometheus data in our environment? I don't think we need namespaces just yet :)
@Ofir-Shechtman if I recall correctly, @Shreyanand and our team have worked on a repo just for time-series metric prediction (here).
For access to cluster metrics, we will need to onboard you as a group/team and then give cluster metrics access to this team.
Could you please try to use the opfcli
tool that we have here to create a group/team?
@4n4nd can you also point to some hitchhiker guide on how to use the toolbox to get opfcli installed?
@Ofir-Shechtman The time series repository and book has learning content for applying analysis and forecasting on metrics from cloud. Your group may also be interested in Operate First Jupyterhub Analysis project, that collects and analyzes metrics corresponding to the infrastructure usage of Jupyterhub on Operate First. This notebook will be a starter for accessing such data from Prometheus (logs, metrics, and events). There are also notebooks corresponding to a resource allocation problem we defined based on CPU and memory usage time series. Feel free to play around with the notebooks, and I'll be happy to answer any questions you may have.
@Shreyanand are those notebooks available as a Notebook Image in our operate first jupyter hub?
@durandom The time series project has it and the instructions can be found here. The Opf Jupyterhub analysis project is relatively new so we don't have an image yet, I'll add an issue in the repository.
@Ofir-Shechtman if I recall correctly, @Shreyanand and our team have worked on a repo just for time-series metric prediction (here).
For access to cluster metrics, we will need to onboard you as a group/team and then give cluster metrics access to this team.
Could you please try to use the
opfcli
tool that we have here to create a group/team?
Hey, We've installed the 'opfcli' on my local Linux machine and created a group named PrometheusAI. Then we updated the 'group.yaml' file. I couldn't find any further explanation in the 'opfcli' repository, is it enough?
@4n4nd can point you to clearer docs or create them ;)
@Ofir-Shechtman while I work on the docs for this, you can make the following changes and create a PR for us to review it.
@Ofir-Shechtman while I work on the docs for this, you can make the following changes and create a PR for us to review it.
- All github usernames should be in lowercase
- you will need to add this group in this kustomization so that this group is available in all of our clusters.
Hey Anand, I've cloned the 'apps' repo, modified the files as you said to a new branch. But I can't push it. I tried using, SSH key or without SSH, and both gave me a denial. Can you give me (benlugasi) permissions for pushing into 'apps'?
Thanks, Ben
Can you give me (benlugasi) permissions for pushing into 'apps'?
@benlugasi we don't push directly to the apps repo. You will need to fork the https://github.com/operate-first/apps repo first.
Fork
button on the top right of this page.Can you give me (benlugasi) permissions for pushing into 'apps'?
@benlugasi we don't push directly to the apps repo. You will need to fork the https://github.com/operate-first/apps repo first.
- Click on the
Fork
button on the top right of this page.- Once this creates a copy of the repo in your account, you can make changes to this new repo and push them.
- After pushing changes to your fork, you should be able to create a Pull Request to get your changes merged in the operate-first/apps repo (more instructions on how to do this).
Hey, I've created a PR for your review, you can find it here. Please tell us if there is anything else that needs to be done.
Thanks, Ben
Hey, can you please provide us a guide of:
Hey, any update?
@benlugasi what exactly are you missing? I see you closed your PR.
@moradnir
- How to import data from Prometheus?
We have some docs available here for API access.
- How to run programs on Red Hat servers?
Can you explain what your workload is and how you are planning to deploy it?
If you just need a namespace on one of our clusters, you can follow this guide. The smaug
cluster should be the right one for you.
@moradnir
- How to import data from Prometheus?
We have some docs available here for API access.
- How to run programs on Red Hat servers?
Can you explain what your workload is and how you are planning to deploy it? If you just need a namespace on one of our clusters, you can follow this guide. The
smaug
cluster should be the right one for you.
Hey @4n4nd , Your docs were very helpful and really helped us to make a progress. Also, we couldn't access the servers. I saw that we don't have any project to our group, so i opened one and assigned it to smaug cluster. Here is a PR for this project, hope it'll solve our access problems. Thanks, Ben
Now that we have an active project and group assigned to smaug I'm trying to connect Thanus using the 'operate-first' button and I'm getting 403 Permission Denied
Can you tell me what am I doing wrong? Or maybe give us some examples to start with. Our main first task is to deploy a recurring job that collects data from Prometheus to a file. We already have project and group on operate-first/apps
@4n4nd I saw this PR merged, It should work now? I still see this error on Thanus:
@benlugasi the changes in that PR weren't applied yet, can you please try again?
@4n4nd Now it works, thanks! We'll try to follow the docs and get some data :)
Hi Team,
We're using the templates provided in this repo But the examples use the Prometheus demo server and we're trying to access the Thanos server (provided here) Our querying attempts via the provided python syntax are getting blocked by a 403 error caused by the authentication requirements of the Thanos server which is protected by the OpenShift oauth-proxy (version 2.3.0) .
None of our attempts for generating the correct authentication token were successful. Do you have any guides on how to access this server via python or simply the http API over the authentication request?
@guyelf I added some instructions for programmatic access to thanos metrics here. Lmk if these instructions don't work for you.
@guyelf I added some instructions for programmatic access to thanos metrics here. Lmk if these instructions don't work for you.
We've managed to access the Thanos from our code using your instructions, thanks! Unfortunately, all the queries we receive are empty, although last week we could see the data from the Server UI.
Something has changed?
Hey @4n4nd , It has been a week and we still don't see anything on Thanus Server, any idea?
Hi Team,
Just to add more context here, it appears we now have more issues also with the authentication server.
There's a 503 error message being thrown by the authentication server which prevents us from accessing the server. This also prevents us from creating tokens or interacting with the Thanos server in general.
Error message as follows:
hey the auth issue should be resolved now. Can you please check again if you can query for data?
Hi Team,
Me again, we're working on the auth server again and again it looks like it's down:
Same issue as before. Can anyone help us recover the faulty server so we can continue our project? Thanks in advance,
Additionally, if you have any documentation on how to run jobs on the Smaug cluster it will be very helpful.
Best Regards,
@guyelf can you open a new issue for this problem with a description on how to replicate the problem? I just see a keycloak
URL.
Also, is this issue here (#454) good to be closed?
Apologies, just got back from the holidays. Auth seems up, is this issue resolved?
Hey, Auth seems to be ok now, and the issue is good to be closed. We would like to know if there are any other communication channels to pop some more questions in the future? Moreover, we still don't know how to run jobs on smaug cluster. Can you share with us relevant documentation for doing so?
closing, since the namespace is up. Any further problems should go into new issues
Target cluster
No response
Team name
Technion_Library
Desired project names
PrometheusAI
Project description
We are a group of Technion students who are in collaboration with Ilya Kolchinsky, PhD from RedHat (ikolchin@redhat.com) working on the enhance of the capabilities of Prometheus by adding AI predication powers to this monitoring system.
We'll need access to the Prometheus server on the operateFirst clusters, as well as access in order to build our project on the OperateFirst system. We haven't chose yet a cluster because we're not sure which one will fit our needs/goal.
Users needing access
moradnir, Ofir-Shechtman, guyelf, benlugasi
Namespace Quota
Small
Custom quota
No response
Your GPG key or contact
No response