Open mmaiers-nmdp opened 5 years ago
For 1. EBMT provides a PDF that lists a lot of IONs, but its there a more accessible (and potentially comprehensive) list (text-formatted maybe?) or repository that could be queried for IONs, or is EMBT the place (or is the www.iccbba.org/ document above the place to go?)?
We would want to validate the ION against something before sending it to the database.
For 2. above, I suggest we use the NCBI's PMCID - PMID - Manuscript ID - DOI converter to convert all provided PubMed IDs, PMC IDs, NIHMS IDs, UK IDs, etc. into DOIs. This can also be used to validate a provided ID, so that the client can reject the label if the ID is invalid.
However, what is to be done in cases where the haplotyping generator group does not have an ION and does not have a published reference for the data? This is not an uncommon issue with AFND, where unpublished data are loaded into AFND with no external citation.
The ION database exists only as either a xml document or an excel document (that could be exported to csv). The difference between the two is primarily that the excel version includes inactive facilities. There are a couple issues here:
A. They don't have any sort of flagging notification that they've updated the db. They keep a pdf log of the changes (they've added one facility a year since 2017), but there's no real way that I know of to ping and see if the files have changed short of redownloading them. Problems:
This second one brings up:
B. I can either download the xml and have the program parse it, or if we want the inactive facility IONs as well, download the Excel file and convert it into CSV file that gets physically saved with the program. (There's a resource folder that allows you to add non-Java files and access them within a compiled jar. It's how I added the help documentation.) In either case there are issues:
The Phycus GUI currently does neither, but does screen to make sure the ION is valid per the iccbba naming conventions. (A four digit number that cannot start with a 0. So 1000 - 9999.)
Regarding the other labels: right now we have haplotyping entity (this used to include the ION, now they're separate labels), genotyping entity, and ION. I was planning on adding DOIs next using the generator Steve found for converting assorted other IDs into DOIs.
Haplotyping and genotyping entities are inherited fields and can be changed or dropped. (The Java CLI included them by default, but didn't actually include a way to specify them, they were hard coded in the function.)
Martin, when you say you want an explicit way to create new label types, were you thinking of something like populations where they have to manually add them before using them? What about the values of these labels?
Also, we still need a place to put some sort of attribution data. Would putting that as a label be a valid option? And if yes, how do we want to do that? Name? Phone/email/address? If all of the above, separate labels for each? According to curation-swagger-spec.yaml there's a way to pull this information back out of the database, so having a tab with this all in it is an option. Maybe a dropdown with the available labels in it, that, when selected, shows the values found in the database associate with that particular label?
we want to constrain label types For example, a labelType of "ICCBBA ION" could refer to the data here We don't want label types of ICCCBBAAA ION etc.
we want an explicit way to create new label types (and show what label types are in the database) Have a REST endpoint GET/POST LabelType Other label types: DOI - reference to a manuscript PMID - PubMed ID