openminted / Open-Call-Discussions

A central place for participants in the open calls to ask questions
2 stars 1 forks source link

AgroPortal Hackathon #36

Open twktheainur opened 6 years ago

twktheainur commented 6 years ago

Dear OpenMinted Team,

Our implementation is concluded and we were already given feedback on the metadata we produce for the Agroportal, SIFR Bioportal, NCBO Bioportal and Biblioportal ontologies by Penny. We integrated the suggested changes. Please note that the submissions made on the test platform were prior to our integration of the changes resulting from the feedback on the metadata.

I am creating this issue for a final check. Everything is working, including future milestones (Biblioportal and NCBO Bioportal support, although for NCBO Bioportal the api call to retrieve all ontology metadata at once does not function due to a server timeout outside of our control).

For the final deliverable, we have a gihub project (https://github.com/agroportal/ncboproxy), with a detailed readme explaining the general architecture, some details about the OMTD-Share adaptation specifically, deployment instructions (although the API is based on the production web-services and will not require any deployment on the part of the OpenMinted team) and a Javadoc documentation.

Do you think the level of detail is sufficient, so that the content of the gihub project may be used as the last deliverable in full? Otherwise, we can produce a standalone PDF document reprising this information. In the latter case, should we include a full printout of the code (as the instructions appear to suggest) or would a link to the GitHub project suffice?

I am also including some metadata outputs for Agroportal, SIFR Bioportal, NCBO Bioportal and Biblioportal for your reference:

AgroportalSample.zip SIFRBioportalSample.zip BiblioportalSample.zip NCBOBioportalSamples.zip

Best Regards, On behalf of the AgroPortal team, Andon Tchechmedjiev

pennyl67 commented 6 years ago

Hi Andon I have downloaded and run the sample XML files with a validator, and they are mostly ok apart from the following remarks:

@antleb Could you also please check the technical details and see if the XML files can be imported to the registry as required?

greenwoodma commented 6 years ago

@pennyl67 looks as if some XML elements have disappeared from at least the first bullet point?

twktheainur commented 6 years ago

@pennyl67

  1. I have now fixed the value as "text". I had though, from your initial feedback, that the value should be a description text rather than the literal value "text".
  2. The invalid email is caused by incorrectly input metadata for that ontology on Biblioportal, unfortunately not something I have control over. If that could be a significant issue for the platform, I can put a filter that only allows valid emails in this field, although it means that some ontologies will not have contact emails at all (which may render the XML invalid as per the XSD specs.)
  3. That can be done quite easily, I will make the change
pennyl67 commented 6 years ago

@greenwoodma you are right! Thanks for noticing! @twktheainur

  1. sorry if I had misinformed you; it's definitely a misunderstanding; anyway, the values you have used could very well be used for "keyword" - no need to lose them
  2. ok, this could be a problem. If we leave an invalid email in this field, the file won't be uploaded because it won't validate. If there's no contact information, the file is again considered invalid. But in the sample I saw, the email with the url site was used for the contactPerson, while you had correctly mapped the url to the landingPage element as well. So, you could simply not use the communicationInfo template at all for the person. Would that help? Or are there other cases we should also consider?
  3. Perfect!
twktheainur commented 6 years ago

@pennyl67 For number 2, I think the best option would be to remove the contextPerson entry altogether in the case where the value supplied for email is invalid.

I will make the changes, deploy the updated adapter code and notify you here when it's done

jonquet commented 6 years ago

Hello all, sorry for being late, thanks @twktheainur for reporting on our project.

To come back on point 2: whatever we decide, the key aspect if to go down to the ontology contact person to actually let them know they should correct the metadata. I think this is ok to say to someone uploading an ontology to one of the 4 repositories: if you fill in this ans this more carefully, your ontology will be available also in the OMTD platform. I will contact the owner of the ontology in BiblioPortal that is invalid.

As of producing the final deliverables (T3, D4 and T4) we will do it offline as official PDF documents referring to the GitHub project (https://github.com/agroportal/ncboproxy) in the case of T3.

On our side, we still need to: a. Fix the timeout when producing the Zip file for the NCBO BioPortal b. Implement with the NCBO team the rerouting from the bioontology.org and ontoportal.org domains These two last points shall be discussed soon with @graybeal and @alexskr c. Produce the final deliverables

twktheainur commented 6 years ago

@pennyl67 I have made the corrections. I am attaching a full metadata export for all ontologies on SIFR BioPortal (29 ontologies), AgroPortal (98 ontologies), BiblioPortal (26 ontologies) and NCBO BioPortal (779 ontologies)

Agroportal_ontologies_omtd-share_metadata-16_04_2018-20_20.zip SifrBioportal_ontologies_omtd-share_metadata-16_04_2018-20_20.zip Biblioportal_ontologies_omtd-share_metadata-16_04_2018-20_16.zip NCBOBioportal_ontologies_omtd-share_metadata-16_04_2018-20_46.zip

pennyl67 commented 6 years ago

@twktheainur Thanks once more! So, I have validated the files and did some sample checking - there's no way I can check each and every file - and the only remaining things I found are:

twktheainur commented 6 years ago

Thank you for the feedback @pennyl67. I will add the ontology URI in the portal as a fallback landingPage. For the nonStandardLicenceTermsURL, the regular expression that checked for URLs contained an error. I have replaced it by an exhaustive regex to match valid URLs, the problem should be now solved.

Yesterday I noticed that the openminted project group on github had an omtd-model maven project that is available on maven central and that also creates a JAXB binding of the XSD specification in exactly the same manner as our implementation. Consequently, I have included omtd-model as a dependency of our implementation and replaced our jaxb bindings. This should improve maintainability when future versions of the specifications are released.

pennyl67 commented 6 years ago

Thanks @twktheainur!

pennyl67 commented 6 years ago

@twktheainur When you have the updated metadata records for the ontologies, could you upload them again for the final check? Thanks!

twktheainur commented 6 years ago

@pennyl67 Here are the updated metadata records for the 4 portals.

Agroportal_ontologies_omtd-share_metadata-19_04_2018-13_34.zip SIFRBioPortal_ontologies_omtd-share_metadata-19_04_2018-13_23.zip BiblioPortal_ontologies_omtd-share_metadata-19_04_2018-13_26.zip NCBOBioPortal_ontologies_omtd-share_metadata-19_04_2018-13_45.zip

I tried to submit a few ontologies on the test platform directly from the URL of our API call rather than by copy pasting the XML, as it is the intended role of that API, which seems to be working fine

pennyl67 commented 6 years ago

I've run the validation test again and I get the following errors:

twktheainur commented 6 years ago

Apologies, it appears my initial reply did no go through and remained unposted, which I have realised just now

@pennyl67 Some issues were introduced when I switched to using omtd-model as a dependency after the previous round of fixes. I have debugged the issues, the output should now be ok.

Concerning the TEST23 ontology, I believe it is a test ontology that someone submitted to the portal publicly. There is also a TEST ontology. Given that anyone can submit content to biblioportal and that most users are not technologically savvy, such errors are more prone to happen on biblioportal, however it is not excluded the same could happen on the other portals too. All I can do in this care is notify the people in NCBO so that they can address the issue.

Agroportal_ontologies_omtd-share_metadata-19_04_2018-16_58.zip SifrBioPortal_ontologies_omtd-share_metadata-19_04_2018-16_54.zip Biblioportal_ontologies_omtd-share_metadata-19_04_2018-17_00.zip NCBOBioPortal_ontologies_omtd-share_metadata-19_04_2018-17_29.zip

jonquet commented 6 years ago

Just reported to delete TEST ontology from BiblioPortal and enter correct information for Contact info (name+email)

pennyl67 commented 6 years ago

Thanks @jonquet and @twktheainur I'll get back to you with any news on the validation - I didn't have the time to check today

pennyl67 commented 6 years ago

Hi @twktheainur I only found three invalid records (DOCC, NCC, and NCCO in the NCBIO portal. Again, the empty contactInfo problem - maybe some files were not parsed with the suggested solution (i.e. using the resourceIdentifier)? Can you check again and let me know? Thanks

twktheainur commented 6 years ago

@pennyl67 Thank you for the feedback, a corner case wasn't handled properly. I have pushed a fix. I am attaching the corrected version of the three incriminated records

DOCC_NCC_NCCO.zip

pennyl67 commented 6 years ago

@twktheainur Thanks! Then all the files are now valid.