r-geoflow / geoflow

Tools to Orchestrate Geospatial (Meta)Data Management Workflows and Manage FAIR Services
https://github.com/r-geoflow/geoflow/wiki
Other
41 stars 14 forks source link

gmd:report element not created (Inspire validation) #110

Closed juliepierson closed 3 years ago

juliepierson commented 3 years ago

gmd:report element that states metadata is Inspire compliant does not get created. This is an example : `

règlement (ue) n o 1089/2010 de la commission du 23 novembre 2010 portant modalités d'application de la directive 2007/2/ce du parlement européen et du conseil en ce qui concerne l'interopérabilité des séries et des services de données géographiques 2013-10-21 L’article 7, paragraphe 1, de la directive 2007/2/CE correspond aux modalités techniques de l’interopérabilité : il s’agit du règlement relatif à l’interopérabilité : règlement n°1253/2013 du 21 octobre 2013 modifiant et complétant le règlement n°1089/2010 du 23 novembre 2010 true `
eblondel commented 3 years ago

@juliepierson @mrouan we may need to discuss your requirements (@wheintz might be interested as well). As of today the geometa action of geoflow can add an INSPIRE report only if the inspire option of the geometa action is enabled, to be defined in the configuration of your workflow:

{
  "id": "geometa-create-iso-19115",
  "run": true,
  "options": {
      "inspire": true
  }
}

By default this is FALSE as INSPIRE is specific to European users. The other point to mention is that it adds it by default in English.

We have the possibility to specify multi-language description with geometa R package, either we could hardcode it for this part since the descriptive part of INSPIRE reports never change (it would make sense); or we would need to make it vary according to the language specified for the entity.

Here as well @juliepierson @mrouan @juldebar @wheintz this is an open discussion, please give me your suggestions/vote!

Please note also that by enabling the inspireoption, geometa will attempt a check on the INSPIRE online metadata validation service. It has been a while I didn't test it but the last time I did, the service was not available. You may need to check this and see if you get some report. If the report is generated, geometa adds it as XML comments in the footer of the ISO 19139 XML metadata that is generated.

mrouan commented 3 years ago

we (@juliepierson and I) test with "inspire": true option, but we met an curl error (TLS connection...) In geometa https://github.com/eblondel/geometa/blob/95dce726882cb1cd743db53bfdfa721875bc56c0/R/INSPIREMetadataValidator.R the address is https://inspire.ec.europa.eu/validator/v2 but the good address seems to be https://inspire.ec.europa.eu/validator/

eblondel commented 3 years ago

I get the same issue through Travis-CI build tests, and I didn't find a way to fix it yet. Being on Windows I don't get this issue. The URL is correct i've tested yesterday. The issue is indeed related to CURL. can you paste here your config: R sessionInfo(), + result of curl::curl_version()

If you are on Unix, try to install openssl libcurl4-openssl-dev package

juliepierson commented 3 years ago

Thanks ! Tried installing the libcurl package suggested, but I get the same error. sessionInfo() : R version 4.0.3 (2020-10-10) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.2 LTS

Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale: [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8
[5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8 LC_PAPER=fr_FR.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] tools stats graphics grDevices utils datasets methods base

other attached packages: [1] geosapi_0.5 geonapi_0.4 keyring_1.1.0 RCurl_1.98-1.2 sf_0.9-7
[6] urltools_1.7.3 raster_3.3-13 rgdal_1.5-18 sp_1.4-2 tryCatchLog_1.1.6
[11] futile.logger_1.4.3 rlist_0.4.6.1 readr_1.4.0 comprehenr_0.6.8 xml2_1.3.2
[16] httr_1.4.2 stringr_1.4.0 uuid_0.1-4 jsonlite_1.7.2 ows4R_0.1-5
[21] geoflow_0.0.20201203 geometa_0.6-4

loaded via a namespace (and not attached): [1] tinytex_0.26 tidyselect_1.1.0 xfun_0.18 remotes_2.2.0 purrr_0.3.4
[6] lattice_0.20-41 vctrs_0.3.6 generics_0.0.2 XML_3.99-0.5 rlang_0.4.10
[11] e1071_1.7-4 pillar_1.4.7 glue_1.4.2 DBI_1.1.1 lambda.r_1.2.4
[16] lifecycle_0.2.0 plyr_1.8.6 zip_2.1.1 codetools_0.2-18 curl_4.3
[21] class_7.3-18 triebeard_0.3.0 Rcpp_1.0.6 KernSmooth_2.23-18 openssl_1.4.3
[26] classInt_0.4-3 formatR_1.7 mime_0.9 hms_1.0.0 askpass_1.1
[31] stringi_1.5.3 dplyr_1.0.2 grid_4.0.3 bitops_1.0-6 magrittr_2.0.1
[36] tibble_3.0.6 futile.options_1.0.1 crayon_1.4.0 pkgconfig_2.0.3 ellipsis_0.3.1
[41] data.table_1.13.2 assertthat_0.2.1 rstudioapi_0.11 R6_2.5.0 units_0.6-7
[46] compiler_4.0.3

curl::curl_version() : $version [1] "7.68.0"

$ssl_version [1] "GnuTLS/3.6.13"

$libz_version [1] "1.2.11"

$libssh_version [1] "libssh/0.9.3/openssl/zlib"

$libidn_version [1] "2.2.0"

$host [1] "x86_64-pc-linux-gnu"

$protocols [1] "dict" "file" "ftp" "ftps" "gopher" "http" "https" "imap" "imaps" "ldap" "ldaps" "pop3"
[13] "pop3s" "rtmp" "rtsp" "scp" "sftp" "smb" "smbs" "smtp" "smtps" "telnet" "tftp"

$ipv6 [1] TRUE

$http2 [1] TRUE

$idn [1] TRUE

eblondel commented 3 years ago

What is the exact curl error message you get?

juliepierson commented 3 years ago

Error in curl::curl_fetch_memory(url, handle = handle) : GnuTLS recv error (-110): The TLS connection was non-properly terminated.

eblondel commented 3 years ago

This what I call a "nasty" issue. It occurs on Linux OS when libcurl relies on GnuTLS curl version. And the latter make fails API calls done on the INSPIRE metadata validator API. The libcurl4-openssl-dev doesn't seem enough without any recompilation of libcurl. This needs more investigation to find a fix.

juliepierson commented 3 years ago

Ok, thanks. I'll look into it but if I understand correctly it is not related to geoflow ?

eblondel commented 3 years ago

Indirectly, we can't do anything in geoflow. It's related to your OS environment and the curl ssl version you use that is not compatible to interact with this particular web-service.

wheintz commented 3 years ago

Hi Julie, Do you use a proxy? It seems to be a hosting-related issue...

eblondel commented 3 years ago

It seems that by default the gnutls version of the curl library is installed, while the curl binary is linked to openssl. You need to remove the gnutls version of the curl library and recompile the R package. It will work @wheintz can you make a test?

wheintz commented 3 years ago

Hi all, Problem fixed with the following steps:

Remove libcurl3-gnutls

Cheers, wilfried

juliepierson commented 3 years ago

Thank you for this fix. It is a bit dangerous though, since quite a lot of software seem to rely on libcurl3-gnutls, and removing it removes these softwares as well (on my computer : QGIS, postgis, libreoffice, blender, dbeaver and maybe some others ?). I reinstalled them and libcurl3-gnutls was reinstalled. So now I have both libcurl3-gnutls and libcurl3-nss, I don't know if it can cause problems ? Anyway, it seemed to have done something since I do not get an error any more when I set Inspire to true for geometa-create-iso-19115 action. There's no report element in the XML, but there's a link to an inspire HTML report at the end of the XML : <!--INSPIRE HTML Report : https://inspire.ec.europa.eu/validator//v2/TestRuns/9e830a8b-3209-455f-b936-a3c9763bf9ce.html--> Maybe because the validation failed for this metadata ?

eblondel commented 3 years ago

R packages curl/ httr requires openssl to properly work, at least is is required for some web-services like INSPIRE. We can't do much about that unless switching to openssl. @wheintz any thoughts about potential adverse impacts on other services?

For the report, the geometa ISO 19115 that is plugged in geoflow doesn't create ISO 19115 report elements relative to INSPIRE, except the standard compliance/completeness reports expected by INSPIRE. Feel free to suggest improvements to make it richer.

But the main feature of the inspireoption allows you to check the INSPIRE compliance of your metadata. If set to TRUE, geoflow(actually geometabehind, through its INSPIREMetadataValidator) sends the ISO 19139 xml generated to the online INSPIRE metadata validation service, gets the validation report, and embedds information relative to this information as XML comments, in this way:

<!--INSPIRE compliance: NO-->
<!--INSPIRE Status : FAILED-->
<!--INSPIRE Completeness : 35% (7 PASSED, 13 FAILED)-->
<!--INSPIRE Test Run ID : EIDeabab099-8dd3-4b2f-a98e-aee6185fef70-->
<!--INSPIRE Log : https://inspire.ec.europa.eu/validator//v2/TestRuns/EIDeabab099-8dd3-4b2f-a98e-aee6185fef70/log-->
<!--INSPIRE Ref URI : https://inspire.ec.europa.eu/validator//v2/TestRuns/eabab099-8dd3-4b2f-a98e-aee6185fef70.json-->
<!--INSPIRE HTML Report : https://inspire.ec.europa.eu/validator//v2/TestRuns/eabab099-8dd3-4b2f-a98e-aee6185fef70.html-->

It is important to notice that these reports are cleaned out after some time, they are not persistent in time. So as convenience for the metadata manager, I add them as XML comments to inform that the metadata has passed or not the INSPIRE validation. The html report is available so validation errors can be inspected. However adding these URLs to ISO 191139 is not recommended as they will point to non-existing resources after the cleaning by INSPIRE.

wheintz commented 3 years ago

RStudio is the only service running on our server, so it wasn't a problem for us to remove libcurl3-gnutls ;-) But indeed, I guess it could cause troubles with other softwares... libcurl3-gnutls and libcurl3-nss seem to coexist quite well, but I unfortunately don't know more on the subject.

eblondel commented 3 years ago

Ok i think we close this ticket. I believe this goes beyond geoflow, unfortunately. You can also see the thread related to similar topic here raised on D4Science infrastructure and where we discussed the issue, and solve it for RStudio servers. https://support.d4science.org/issues/20660 I will keep an eye on these issues and liaise with D4Scienc e-infra admins in case they have a solution with GnuTLS.