newrelic / newrelic-airflow-plugin

Send airflow metrics to New Relic!
Apache License 2.0
25 stars 19 forks source link

Metrics missing after installing plugin #13

Closed atsalolikhin-spokeo closed 4 years ago

atsalolikhin-spokeo commented 4 years ago

Hello,

I just installed newrelic-airflow-plugin, and restarted the Airflow web service to load the environment variables added to the service systemd unit file (I added the service name and New Relic Insert API Key).

I immediately started seeing Airflow metrics in NewRelic.

They have names like dag.loading-duration.mydag where mydag is the DAG name.

I am confused because: a) dag.loading-duration is not listed at https://airflow.apache.org/docs/stable/metrics.html b) all the other metrics that are listed are not present in NewRelic.

Questions:

  1. Do I need to update the environment for the Airflow Executor and restart it also, to get the rest of the metrics?
  2. What are the dag.loading-duration.* metrics?
atsalolikhin-spokeo commented 4 years ago

Funny thing: I found https://airflow.apache.org/docs/1.10.11/metrics.html does not have dag.loading-duration.* but https://airflow.apache.org/docs/1.10.4/metrics.html does! The description for it reads "Seconds taken to load the given DAG". Do you have any insight into why the documentation for dag.loading-duration.* was removed? Or why this is the only metric I'm seeing in NewRelic?

atsalolikhin-spokeo commented 4 years ago

If I trigger a DAG, I get a task_instance_created metric in NewRelic. However, I'm not seeing dag.*.duration which is what I'm interested in (how long the DAG took).

I tried adding the env vars to the systemd unit files for airflow worker and airflow flower and restarted both services. I re-triggered the DAG which went through several tasks and failed; however in NewRelic I still only see dag.loading-duration.* and task_instance_created-* metrics.

How do I get the rest of the metrics? :)

umaannamalai commented 4 years ago

Hi @atsalolikhin-spokeo, thank you for reaching out with your questions! :)

According to the Airflow UPDATING.md, it does look like the dag.loading-duration.*metric has been deprecated and will not be emitted in Airflow 2.0, so that may explain why it is not included in the documentation you linked above.

To diagnose why you aren’t seeing all the metrics you are expecting in New Relic, would you be able to provide us with what version of Airflow you are using as well as the following?

An example of how you are:

Additionally, when you print a list of tasks for a specific DAG, do you see expected output based on your configurations?

atsalolikhin-spokeo commented 4 years ago

Hi @umaannamalai, thank you for engaging with me on this!

We're on Airflow 1.10.11; let me know if you recommend we upgrade?

We have separate systemd unit files for the various parts of Airflow. I am including the ExecStart commands from these files; let me know if you need to see the whole unit files.

ExecStart=/var/lib/airflow/venv/bin/airflow webserver
ExecStart=/var/lib/airflow/venv/bin/airflow scheduler
ExecStart=/var/lib/airflow/venv/bin/airflow flower
ExecStart=/var/lib/airflow/venv/bin/airflow worker

While reviewing this, I realized I didn't add the environment variables with the NewRelic service name and Insert API Key to the systemd unit file for the Scheduler. I added it, restarted the scheduler, and I see we have more metrics now in NewRelic for Airflow.

In fact, comparing the metrics I see in NewRelic with the list at https://airflow.apache.org/docs/1.10.11/metrics.html (just eyeballing it), I think we have everything now!

Do I need to have the NewRelic Airflow plugin env vars in all four systemd files?

We have some automation for putting the DAGs in Airflow, and that part is working fine (I see all the DAGs in the Airflow UI).

I trigger the DAG by selecting "Trigger DAG" in the Airflow UI (when I'm looking at the DAG).

You asked how I'm generating tasks. My understanding is that those are in the DAG. So Airflow generates tasks when it runs the DAG. (I'm new to Airflow, please bear with me!)

Additionally, when you print a list of tasks for a specific DAG, do you see expected output based on your configurations?

I go to the Graph View in the Web UI to see the tasks for a specific DAG, and yes, they are all there.

So, which systemd files need the newrelic airflow plugin environment variables? :)

atsalolikhin-spokeo commented 4 years ago

P.S. Thank you very much for explaining why the dag.loading-duration.* metric disappeared from the Airflow documentation!

umaannamalai commented 4 years ago

Hi @atsalolikhin-spokeo that is great to hear that you’re seeing all the Airflow metrics in New Relic now :)

The New Relic Airflow plugin supports version 1.8 and higher of Airflow, so you should be completely fine using 1.10.11!

Regarding your question about which systemd files require the New Relic environment variables to be configured, that is going to depend on what metrics you are interested in. Essentially, you will need to include the environment variables in all unit files that correspond to metrics that you would like to see in New Relic.

In your case, you will definitely want the environment variables set for the scheduler to see metrics like dag.*.duration and you will likely want to have the variables set in the worker as well. If there is any information coming from the webserver or flower that you are interested in seeing in New Relic, then you can go ahead and add the environment variables to those unit files as well.

Hope this helps!

atsalolikhin-spokeo commented 4 years ago

Oh, that's very helpful, thank you, @umaannamalai ! :) I appreciate it greatly.

vinitsriwastava commented 2 years ago

Hi @umaannamalai @atsalolikhin-spokeo

I have similar airflow cluster using systemctl and also added env variables in systemctl unit file for all webserver, scheduler and workers but no metric is going to Newrelic if you can help? Here is issue#31