nlpsandbox / nlpsandbox-controller

NLP Sandbox CWL workflow and tools
https://nlpsandbox.io
Apache License 2.0
3 stars 2 forks source link

Store NLP Tools results to the Data Node #2

Closed tschaffter closed 3 years ago

tschaffter commented 4 years ago

A Data Hosting Site with a private dataset won't allow the infrastructure to return the predictions generated by a NLP Tool to the Sandbox backend / Synapse. Currently, we are only considering computing performance metrics like F1 score as part of the processing workflow. However, additional metrics would be useful such as computing confidence interval using bootstrapping and any post analysis that we may thing off. Bootstrapping can be very time consuming with the evaluation of 10,000+ bootstrapped predictions, which is why it has so far never been included as part of the evaluation process.

Here are different options:

A. We don't save the predictions to persistent storage anywhere and we forget about doing additional analysis. B. We save the predictions to persistent storage INSIDE the private network of a Data Hosting Site. Staff members of the Data Hosting Site could then help performing the analysis and return to the organizers or community only the results that they are comfortable sharing. This non-automatic process means that the returned data would likely not be included in a live leaderboard.

thomasyu888 commented 4 years ago

I think we should definitely store the results on persistent storage somewhere just like we do for the EHR challenges. This way if we do plan to do more analysis, we have the results. An alternative could be to only re-run selected submissions and do analysis on results. (assuming that not all submissions need to be further analyzed)

tschaffter commented 3 years ago

@thomasyu888 In addition of saving the predictions to Synapse as done in other challenges, let's save them to them to the data node.

  1. Create the AnnotationStore named submission-SUBMISSION_ID linked to the Dataset used for evaluation
  2. Push the predicted annotations to the newly created AnnotationStore

This would enable us to then use the Data Node client to easily pull the gold standard and annotation to do additional analysis. Another application I have in mind is to apply a "wisdom of the crowd" algorithm to aggregate predictions generated by different tools!

thomasyu888 commented 3 years ago

Thanks @tschaffter! This was my reason for adding the 'store_annotations' function

thomasyu888 commented 3 years ago

I thought this was to store the annotations, but this ticket is to store results: AUPR, etc..

tschaffter commented 3 years ago

So you were correct, this ticket is about storing predictions (not prediction evaluation).

github-actions[bot] commented 2 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.