nlpsandbox / nlpsandbox-controller

NLP Sandbox CWL workflow and tools
https://nlpsandbox.io
Apache License 2.0
3 stars 2 forks source link

Figure out how to return service logs to participants #16

Closed thomasyu888 closed 3 years ago

thomasyu888 commented 3 years ago

Currently, Docker container logs are captured no the ELK stack, but participants do not have access to this. Logs still have to be returned somehow.

tschaffter commented 3 years ago

That's a good point. We still don't want to return logs when running on evaluation data. What I'll do is create a second dataset in the data node that we can use for validation. Log generated during validation must be returned to participants (#17 ).

But that doesn't respond to your question. :) We probably need to export logs from ELK to somewhere participants can retrieve them. It would be a serious security concern if we where to give participants access to the ELK instance.

Currently, the only secure way that we have to return logs is likely to push them to Synapse as we do in most challenges. The controller would need a step where it retrieves the log of a submission from ELK. Does ELK provides an API to access logs?

Thoughts?

thomasyu888 commented 3 years ago

I'm looking into pulling logs of a submission from ELK. That being said, the streaming of log files will be sort of an issue. Lets say there a thousand notes to annotate, and for each note, there are 5 lines of stdout from the service - this means that we are returning at least 5000 lines of logs to the user.

The ELK extension we are using simply interacts with the docker API to populate the dashboard with logs. Ultimately, I wish we had a way to give certain users permissions to certain logs on the ELK stack. This way participants had a dynamic way of viewing their logs. For now, I would most likely just retrieve logs from the annotation service after each clinical note has been annotated.

tschaffter commented 3 years ago

For advanced security, we could forward logs with log security risk to a second ELK instance that users can log on, to make sure that users can not log on an ELK stack that may contain sensitive log as the ones generated when using a "private" dataset.

thomasyu888 commented 3 years ago

Thats what I was thinking, but we don't want users to be able to access other user's logs. Due to this, I think this would be a nice to have, but needs some development.

i do think that this is something we have in our new challenge platform - a dashboard or page of some sort where users can easily access their logs per submission, AND the logs being streamed instead of uploaded into a file every minute.

thomasyu888 commented 3 years ago

@tschaffter , Currently an ugly way of returning logs to participants:

Screen Shot 2020-12-22 at 9 42 18 AM
  1. Go to the search you want
  2. query for a specific submission
  3. Click Inspect
  4. Click Response
  5. Click the copy command and parse logs with python. (Just json)
tschaffter commented 3 years ago

Tracked in https://github.com/nlpsandbox/nlpsandbox-website-synapse/issues/3

github-actions[bot] commented 2 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.