Open cmungall opened 9 years ago
few thoughts
Currently the annotator service marks up unstructured text. => and provides links to content within Monarch
Also, we'd probably want some guidelines on how long can a text block be, specifics regarding any issues to avoid (e.g if there are particular symbols or fonts or formatting that would screw anything up)
Love the idea of being able to say - hey take this spreadsheet and tell me what kinds of genes or diseases i have in a column- which may be a block of free text or a short string
Great! This semi-structured use case is very similar to my prior work with Zoomage. We should chat about this when I have Internet working. Related to the curation dashboard vision as well. Best, Julie
Sent from my iPhone
@mellybelly to edit this description
Currently the annotator service marks up unstructured text.
We want support for semistructured text, e.g. a TSV or Excel file (converted to TSV). The user may specify something broad such as categories for a subset of columns (e.g. col1 may contain gene symbols and would be categorized 'gene', col2 may have disease labels and would be categorized 'disease').
The TSV could be fed to the annotator in bulk, one cell at a time, or row at a time. If the former, the structure could be reconstituted by splitting on tab/nl.
The first thing the user would see is the rows for which one or more columns contained labels that could not be found (particularly full span).
An additional operation could be to compare this with what is in golr; e.g. what new gene-disease associations
Note: depends somewhat on https://github.com/SciGraph/SciGraph/issues/137