Closed atsalolikhin-spokeo closed 4 years ago
Funny thing: I found https://airflow.apache.org/docs/1.10.11/metrics.html does not have dag.loading-duration.*
but https://airflow.apache.org/docs/1.10.4/metrics.html does! The description for it reads "Seconds taken to load the given DAG". Do you have any insight into why the documentation for dag.loading-duration.*
was removed? Or why this is the only metric I'm seeing in NewRelic?
If I trigger a DAG, I get a task_instance_created
metric in NewRelic. However, I'm not seeing dag.*.duration
which is what I'm interested in (how long the DAG took).
I tried adding the env vars to the systemd unit files for airflow worker
and airflow flower
and restarted both services. I re-triggered the DAG which went through several tasks and failed; however in NewRelic I still only see dag.loading-duration.*
and task_instance_created-*
metrics.
How do I get the rest of the metrics? :)
Hi @atsalolikhin-spokeo, thank you for reaching out with your questions! :)
According to the Airflow UPDATING.md
, it does look like the dag.loading-duration.*
metric has been deprecated and will not be emitted in Airflow 2.0, so that may explain why it is not included in the documentation you linked above.
To diagnose why you aren’t seeing all the metrics you are expecting in New Relic, would you be able to provide us with what version of Airflow you are using as well as the following?
An example of how you are:
Additionally, when you print a list of tasks for a specific DAG, do you see expected output based on your configurations?
Hi @umaannamalai, thank you for engaging with me on this!
We're on Airflow 1.10.11; let me know if you recommend we upgrade?
We have separate systemd unit files for the various parts of Airflow. I am including the ExecStart
commands from these files; let me know if you need to see the whole unit files.
ExecStart=/var/lib/airflow/venv/bin/airflow webserver
ExecStart=/var/lib/airflow/venv/bin/airflow scheduler
ExecStart=/var/lib/airflow/venv/bin/airflow flower
ExecStart=/var/lib/airflow/venv/bin/airflow worker
While reviewing this, I realized I didn't add the environment variables with the NewRelic service name and Insert API Key to the systemd unit file for the Scheduler. I added it, restarted the scheduler, and I see we have more metrics now in NewRelic for Airflow.
In fact, comparing the metrics I see in NewRelic with the list at https://airflow.apache.org/docs/1.10.11/metrics.html (just eyeballing it), I think we have everything now!
Do I need to have the NewRelic Airflow plugin env vars in all four systemd files?
We have some automation for putting the DAGs in Airflow, and that part is working fine (I see all the DAGs in the Airflow UI).
I trigger the DAG by selecting "Trigger DAG" in the Airflow UI (when I'm looking at the DAG).
You asked how I'm generating tasks. My understanding is that those are in the DAG. So Airflow generates tasks when it runs the DAG. (I'm new to Airflow, please bear with me!)
Additionally, when you print a list of tasks for a specific DAG, do you see expected output based on your configurations?
I go to the Graph View in the Web UI to see the tasks for a specific DAG, and yes, they are all there.
So, which systemd files need the newrelic airflow plugin environment variables? :)
P.S. Thank you very much for explaining why the dag.loading-duration.*
metric disappeared from the Airflow documentation!
Hi @atsalolikhin-spokeo that is great to hear that you’re seeing all the Airflow metrics in New Relic now :)
The New Relic Airflow plugin supports version 1.8 and higher of Airflow, so you should be completely fine using 1.10.11!
Regarding your question about which systemd files require the New Relic environment variables to be configured, that is going to depend on what metrics you are interested in. Essentially, you will need to include the environment variables in all unit files that correspond to metrics that you would like to see in New Relic.
In your case, you will definitely want the environment variables set for the scheduler to see metrics like dag.*.duration
and you will likely want to have the variables set in the worker as well. If there is any information coming from the webserver or flower that you are interested in seeing in New Relic, then you can go ahead and add the environment variables to those unit files as well.
Hope this helps!
Oh, that's very helpful, thank you, @umaannamalai ! :) I appreciate it greatly.
Hi @umaannamalai @atsalolikhin-spokeo
I have similar airflow cluster using systemctl and also added env variables in systemctl unit file for all webserver, scheduler and workers but no metric is going to Newrelic if you can help? Here is issue#31
Hello,
I just installed newrelic-airflow-plugin, and restarted the Airflow web service to load the environment variables added to the service systemd unit file (I added the service name and New Relic Insert API Key).
I immediately started seeing Airflow metrics in NewRelic.
They have names like
dag.loading-duration.mydag
wheremydag
is the DAG name.I am confused because: a)
dag.loading-duration
is not listed at https://airflow.apache.org/docs/stable/metrics.html b) all the other metrics that are listed are not present in NewRelic.Questions:
dag.loading-duration.*
metrics?