kbase / execution_engine2

KBase Execution Engine
http://www.kbase.us/services/ee2
MIT License
0 stars 7 forks source link

job logs generated for queued jobs with confusing log entries #135

Open eapearson opened 4 years ago

eapearson commented 4 years ago

The call to get_job_logs now returns log entries for jobs which are queued and have not yet run.

I note this as an issue because it is a change from previous behavior.

I discovered this when the data in CI changed, my tests broke, and I set about to generate a test case for a job with no log entries.

Here is a case using job 5dd540027ec5a1da6ebb0f09

here is the job info for job which shows the job was in queued state, and then canceled before it could be run

request

{
  "version": "1.1",
  "id": "123",
  "method": "execution_engine2.check_job",
  "params": [
    {
      "job_id": "5dd540027ec5a1da6ebb0f09"
    }
  ]
}

result:

{
    "version": "1.1",
    "result": [
        {
            "user": "eaptest30",
            "authstrat": "kbaseworkspace",
            "wsid": 42900,
            "status": "terminated",
            "updated": 1574256658553,
            "queued": 1574256643.672457,
            "finished": 1574256658553,
            "terminated_code": 0,
            "scheduler_type": "condor",
            "scheduler_id": "12900",
            "job_input": {
                "wsid": 42900,
                "method": "kb_SPAdes.run_SPAdes",
                "params": [
                    {
                        "workspace_name": "eaptest30:narrative_1559318949611",
                        "read_libraries": [
                            "42900/3/4"
                        ],
                        "output_contigset_name": "SPAdes.contigs",
                        "dna_source": "standard",
                        "min_contig_length": 500,
                        "kmer_sizes": [],
                        "skip_error_correction": 0
                    }
                ],
                "service_ver": "e0a0f9df50bb6765db40f6949b18c7a016610801",
                "app_id": "kb_SPAdes/run_SPAdes",
                "source_ws_objects": [
                    "42900/3/4"
                ],
                "parent_job_id": "None",
                "narrative_cell_info": {
                    "run_id": "8d5ad754-d29d-4043-9439-29f5f8b536d3",
                    "token_id": "e5f189ce-3c4a-4073-8cf9-e1e37db5a926",
                    "tag": "release",
                    "cell_id": "9565150f-868e-4627-89c2-65de4f8b0b37"
                }
            },
            "job_id": "5dd540027ec5a1da6ebb0f09",
            "created": 1574256642000
        }
    ],
    "id": "123"
}

we can see that this job was terminated while queued

And here is the job log request:

{
  "version": "1.1",
  "id": "123",
  "method": "execution_engine2.get_job_logs",
  "params": [
    {
      "job_id": "5dd540027ec5a1da6ebb0f09"
    }
  ]
}
{
    "version": "1.1",
    "result": [
        {
            "lines": [
                {
                    "line": "Job runner recieved Callback URL http://10.42.214.9:33893/",
                    "linepos": 1,
                    "error": false,
                    "ts": 1574115014695
                },
                {
                    "line": "Running on ci-core-condorworker-mixed-privileged-partionable-1 (10.42.214.9) in /cdr/ci/dir_15956/JobRunner",
                    "linepos": 2,
                    "error": false,
                    "ts": 1574115014695
                },
                {
                    "line": "Client group: njs",
                    "linepos": 3,
                    "error": false,
                    "ts": 1574115014695
                },
                {
                    "line": "Job already run or canceled",
                    "linepos": 4,
                    "error": false,
                    "ts": 1574115014695
                }
            ],
            "last_line_number": 4
        }
    ],
    "id": "123"
}

This is confusing for at least two other reasons:

For comparison, here is the result from njsw for job 5d4460ebaa5a4d298c5dc887:

{
    "id": "123",
    "version": "1.1",
    "method": "NarrativeJobService.get_job_logs",
    "params": [
        {
          "job_id": "5d4460ebaa5a4d298c5dc887"
        }
    ]
}

results:

{
    "version": "1.1",
    "result": [
        {
            "lines": [],
            "last_line_number": 0
        }
    ]
}

as can be seen in the screenshot below (for some reason I can't get the job info from njsw right now), this job was canceled while queued.

Screen Shot 2019-11-20 at 6 18 44 AM
mergeable[bot] commented 4 years ago

This is old. Is it still relevant?

mergeable[bot] commented 4 years ago

This is old. Is it still relevant?

eapearson commented 4 years ago

@bio-boris I think we discussed this elsewhere, not captured here. The behavior of the job log browser is to show a message when the job is created or queued, and show the logs if it is running, completed, or errored or terminated after having been run. I believe the main purpose of this issue was to clarify whether the job log is created when the job is created or queued, or whether those log entries are dropped in after the job has been terminated. In other words, if there is a job log to display, we want to display it, but right now there are assumptions about when a job log is even available. The job browser could simply attempt to show the log at all times. The reason it shows a message when a job is in created or queued state is to avoid the confusion of showing an empty log, when we really know the log doesn't exist yet.