scdodev / scdo-ontology

Sickle Cell Disease Ontology
4 stars 8 forks source link

Create GitHub action workflow to automate SCDO release #19

Open scdodev opened 6 months ago

scdodev commented 6 months ago

Details to be discussed in a meeting this Thursday, but for now these are some points to address/consider:

We need to create a new release where the workflow used for compiling the files for release outputs:

Input could be either, or a combination of, the following file(s) (depending on what works best/easiest):

Additional points to be addressed/considered for the next release: 1) ontology metadata:

2) the replacement of SCDO urls with those of external ontology terms causes a self-referencing loop...

3) dynamic imports

Note: this workflow will be prior to developing a new workflow (which is the next step) that will likely include the use of Babelon.tsv files and output additional files for various purposes

JadeHotchkiss commented 6 months ago

@matentzn Another point to remember is that when I tried to locally (master checked out on Windows PC) compile the scdo.owl file using the updated scdo-edit.owl file and updated template files, the compiled scdo.owl file produced did not have urls replaced for relevant terms. It is not clear what prevented that from happening, but if we do want to continue having urls replaced like this (we must discuss point 2 above) then it is important that this will work as it should with the updated workflow.

matentzn commented 6 months ago

We need to create a new release where the workflow used for compiling the files for release outputs:

Dont worry, I will fix all of the above issues for you - I have done all of this before. No need for two edit files.

What I only need is the tables with the translations. Can you share a TSV with say the portugese translation, then I will tell you how to format it so we can do all of the above?

the replacement of SCDO urls with those of external ontology terms causes a self-referencing loop...

I wouldnt worry about this right now.

perhaps we should add an annotation pointing to the term's page on the SCDO website? And other places it can be viewed?

This is a good idea! Yes we can do that easily enough. Just make a seperate issue for this.

problem: we can't just have labels and/definitions automatically updated though, as we may not approve of the changes, and because our translations are affected by any changes.

not replacing urls with external ones (see 2 above) could solve this problem?

I am undecided about this.. On the one hand, SCDO is an application ontology, so it makes sense to not worry too much about term re-use. On the other hand, every new identifier with the same label makes our world a worse and more complex space..

Can we punt this part of the discussion until we have dealt with the international edition and made a new release?

JadeHotchkiss commented 6 months ago

We don't want to implement Babelon.tsv files in this next release though as we want to make use of the file produced by our Ontology Translation Tool (OTT) GUI, which itself takes the translations from the Excel file from reviewers and includes it in the scdo-edit file. This is a major aspect of the manuscript we have written.

Add to the mix the layperson SCDO OWL file, which is created separately, using a similar but different workflow, and now we have so many different products/moving parts!

Yes, this can all be streamlined and improved with an updated workflow, which will involve changing and removing some of what the OTT does, but we're not ready for that yet.

That's why I suggested just manually updating metadata in the various SCDO OWL files we currently have in GitHub and creating a new release with those for now. We will indicate in our manuscript that this manual part will be improved in planned updates to the workflow. Then we work on improving the workflow, which will affect quite a bit on my side, which I will need time to update as necessary, and use that for the next release.

I think let's wait to discuss tomorrow and then decide next moves.

matentzn commented 6 months ago

No, I know - no need to worry about babelon at all - any tabular representation of the translations will do! In your own format!

JadeHotchkiss commented 6 months ago

The point is, our current solution, which we are publishing now, already takes the translations in tabular format and adds them to the scdo-edit file, so we don't want to confuse things by now by including that in the compilation process for this release that we will be pointing to.

The issue is now to be able to create a release (with no updates to English content as yet -> still same content from 2021!) that includes the files for translations. So the content of the compiled files we already have in GitHub for translations would be the same, just with metadata updated as necessary.

The problem with urls not being replaced was when I was testing the compilation workflow (as part of preparing for updating French translations as per updates to the English, which we are not actually including in this release yet.). Needs fixing if we are to compile files for this next release, but can be left for now if we just manually update metadata in existing compiled files, which is all we really need to fix at the moment.

If this doesn't make sense, best we chat tomorrow and I explain better.

matentzn commented 6 months ago

Ok, in this case, scdo edit contains all translations, and we just add the code to create the translated ontology! No problem.

JadeHotchkiss commented 6 months ago

@matentzn I think I got a bit distracted in our discussion yesterday... We ended off deciding you would create a workflow that takes our scdo-edit, translations in tables, and a config file and incorporates some of the code from our ontology translation tool (OTT) in the compilation workflow to generate the necessary output files. Although that would be part of the updated workflow going forward, it still defeats our aim of having a release now where the OWL files containing translations have been produced using our translation tool...not sure if you were proposing something to solve this that I missed?

Maybe for now, to avoid producing a release now that involves deconstructing our OTT, I should rather make the compiled French SCDO OWL file we already have available only on our website instead of in Github?

Regarding updates to the workflow going forward: I remembered after our meeting that the reason I was adding annotation translations into the scdo-edit file, instead of the compiled owl file, was because the Excel files sent to reviewers are produced using the curator's/editor's version, which only contains SCDO IDs. Because the IDs are switched out for quite a few terms in the compiled file (we implemented this in the ROBOT workflow after the code for incorporating translations was done), adding the translations into the compiled file would involve an additional step of using the SSSOM mappings.

So, point being, for the updated workflow, if we continue to use the scdo-edit file to create the files for auto-translations to be reviewed: to add translations straight into the compiled OWL file later on, the automatic release workflow would need to produce the SSSOM mappings before the production of OWL file(s) containing translations, as the mappings would be needed to map terms in the translation tables to terms in the compiled OWL file. if the various translations can first be added to the scdo-edit file and then that is used to produce the various output files, then that removes the problem of needing to map term IDs from the curator's version to those in the compiled file. Do you have thoughts on the pros and cons of these 2 different approaches?

And thinking more about this... 1) It might be a good idea to have an initial "test compile" step before we produce files for reviewers of translations... This would act as a QC on labels and definitions in case anything is flagged that needs fixing first, e.g. missing or duplicate definitions, or definitions containing symbols that aren't recognised (not sure this is checked by ROBOT?). It could also allow for fixing of: any definition source urls that are broken (assuming ROBOT would throw errors for those when generating the SSSOM files?) so that that would not be a problem to halt the process when the automatic release workflow is run later down the line? errors in association/template files (mismatching IDS or labels?) 2) As I mentioned in our meeting yesterday, the information provided by reviewers/curators for layperson terms also includes additional layperson synonyms, so the incorporation of the layperson annotations into the OWL file also involves updating the list of synonyms for that term. ----> so we might want these changes to synonyms to be affected in the scdo-edit file first and then the resulting OWL file is used for generating the final compiled English and other translated versions?

3) I said it would be easy enough to provide the files with annotation translations in .tsv format, but I was somehow not thinking of the fact that our code that takes them as input would need to be updated. So, either we keep them in Excel format and continue with the same code, or update the code, however, considering my thoughts on updating the process used for reviewing translations (see point 4 below), the code might need to be updated going forward anyway.

4) Looking at our workflow for obtaining translations from reviewers and how that affects release turn-around times... Even though we can facilitate turn-around time for releases by creating interim releases that use translation statuses to indicate where translations are not official yet (auto-generated/under review, etc), that is still likely to cause significant delays when the ontology is being translated for a new language (which can take quite a few months to complete). ---> This can pose problems when any urgent updates made to the English curator's version during that stage of the cycle would need to wait for a release containing the reviewed/completed translations before they can be released. ---> The best way of solving this, that I can see, is to make reviewing of translations be an online, ongoing process, where "screenshots" of work on translations are used in releases and content for reviewers is updated as necessary. I was looking at this yesterday and I think it would be possible to use shared Google spreadsheets for something like that. ---> This would not affect the ROBOT workflow, other than it is going to take time for me to set up such a system and adapt the relevant code as necessary.

matentzn commented 6 months ago

@JadeHotchkiss Lets just do it like this:

  1. You add all languages to scdo-edit.owl.
  2. I will deal with splitting them back into separate files

I am off for the next 10 days, but if you upload a new edit file with all languages, I can do the rest

JadeHotchkiss commented 6 months ago

@matentzn Can we add a step in the ROBOT workflow that adds a link to each SCDO term's web page on the SCDO website? This is an example link: https://scdontology.h3abionet.org/ontology/SCDO_0009293

Would RO's "homepage" (http://xmlns.com/foaf/spec/#term_homepage) annotation property be fine for this or would you suggest a different one?

scdodev commented 4 months ago

@matentzn I have now added a "homepage" (http://xmlns.com/foaf/0.1/homepage) annotation property to the curator's OWL file.

matentzn commented 4 months ago

Thats great, yes, you can use that one. We typically use this ontology as a source for annotation properties:

https://ontobee.org/ontology/catalog/OMO?iri=http://www.w3.org/2002/07/owl%23AnnotationProperty

But there is no "homepage" suggestion I think.

You could request one here: https://github.com/information-artifact-ontology/ontology-metadata/issues