Open engmtcdrm opened 5 months ago
Polling the job status in the operator is suboptimal. Implementing defer operator is a better choice. Because it frees up the precious CPU task slot and let the Triggerer takes care of polling.
We have experienced a similar log collection delay in a customized operator. A second check or a delay collection are two possible solution.
If you have a piece of code that takes less time to run than to download the log, the SASStudioOperator will not finish downloading the log. This is due to how the method
_run_job_and_wait
works. It will continue to loop until the status of the job changes out ofrunning
orpending
(let's ignore the unknown logic for a moment). While it is looping it will call the methodstream_log
which will download the log for Airflow. However, if the job completes before the log can be downloaded, the log will be incomplete.The SAS code to run to replicate this is rather simple, it spits out a ton of NOTEs to the log intentionally trying to see how far the log can be pushed before breaking. If the uncommented 500000 does not work to see the issue, just change it to a higher number and try again.
The solution to this is to have a second check after the while loop on line 347 to check if the
num_log_lines < job['logStatistics']['lineCount']
. This logic should be a while loop replace the logic on line 379 because that will only grab the last 99999 lines of the log, not ALL the rest.Lastly, here is a screen shot of the log from the code I ran above in Airflow UI. You can see how the log abruptly stops.