iodepo / odis-arch

Development of the Ocean Data and Information System (ODIS) architecture
https://book.odis.org/
29 stars 17 forks source link

connect ODISCat as the main ODIS config #307

Open jmckenna opened 1 year ago

jmckenna commented 1 year ago

cc @arnounesco @pbuttigieg @fils

pbuttigieg commented 1 year ago

ODISCat purpose is to list your organization's Products or Services

it's to list data sources - what those data sources describe is secondary. One organisation / individual in OceanExpert can be linked to one or more ODISCat data source entries.

These can be APIs, web services, portals, or any other mechanism through which (meta)data can be acquired

jmckenna commented 1 year ago

For reference, the new ODISCat pattern template (that we had together drafted on 2023-08-17) is here: odisCatOrganization-example.json (link updated on 2024-04-26 to new repo)

arnounesco commented 1 year ago

see #308

arnounesco commented 1 year ago

should not be closed, #308 is about the format of the pattern, not about ODISCat providing it

jmckenna commented 6 months ago

setting urgency label here (@pbuttigieg please adjust as necessary - I created 3 new labels for urgency)

jmckenna commented 1 month ago

reporting an internal ODISCat issue here as well, as it applies to this ticket:

@arnounesco some important questions/points for the ODISCat-ODIS connection:

related to https://github.com/iodepo/ODISCat/issues/103

cc @pbuttigieg

jmckenna commented 4 weeks ago

@arnounesco I've updated the ODISCat JSON-LD template with @pbuttigieg's changes (to use @type CreativeWork)

fils commented 2 weeks ago

@arnounesco This looks good.. just one small issue...

  {
            "@context": {
                "@vocab": "https://schema.org/"
            },
            "@id": "https://catalogue.odis.org/view/256
        ",
            "@type": "Organization",
                        "email": "info@ico",

There is a control character \n at the end of the @id value. Would not be an issue in the object literals, but in the subject IRI it's not a valid character.

I can parse such things out of course client side, but better to have it valid server side.

Note that google validator (https://validator.schema.org/#url=http%3A%2F%2Fcatalogue.odis.org%2Fview%2F256) fixes such things. Sometimes I kinda wish they wouldn't. Or at least have a "strict" mode.

If you can use a trim function on the strings or something like that, it is likely a simple fix.

Thanks Doug

arnounesco commented 2 weeks ago

@fils I cannot reproduce this, how did you get that content? Tried to view the code or to download, nowhere there is a newline. Also in the code there is nowhere a newline to be seen. This all does not mean you are wrong, but I cannot check what would be the result of any action I take.

jmckenna commented 2 weeks ago

I also cannot reproduce. (I use the command :set list inside vi on Ubuntu, to show hidden characters for the test entry)

  curl -OL https://catalogue.odis.org/view/256
  vi 256
    :set list

gives:

{$
            "@context": {$
                "@vocab": "https://schema.org/"$
            },$
            "@id": "https://catalogue.odis.org/view/256",$
            "@type": "Organization",$
                        "email": "info@xxxx",$
fils commented 2 weeks ago

Interesting.. I see what you are both seeing too.

Let me check if the python library is messing something up. There might be a processing setting I need to play with.

fils commented 2 weeks ago

so tried with with extrunct rather than BeautifulSoup and I still see it.

I get

{
  "@context": {
    "@vocab": "https://schema.org/"
  },
  "@id": "https://catalogue.odis.org/view/263\n        ",
  "@type": "Organization",
  "email": "nodc@meteo.ru",

So now you see the \n with spaces or a tab after it..

I'm trying to resolve why I see this in python, with two different libraries, but you don't see it in vi.

fils commented 2 weeks ago

@jmckenna @arnounesco

really odd, if I look in the "source view" of the browser it looks fine.. but no matter how I pull it down with python, I get

{'@context': {'@vocab': 'https://schema.org/'}, '@id': 'https://catalogue.odis.org/view/257\n        ', '@type': 'Organization', 'email': 'nodc@meteo.ru', 'contactPoint': 

with the \n in the @id string.

Still trying to explain this.

fils commented 2 weeks ago

OK, I think I found it. The python package "response" seems to be the issue, I replaced it with httpx and that seems to be working now. Very odd, but no interest in resolving the issue with that package, will simply use httpx.

Thanks!

FYI, I also indexed with Gleaner, which did work but did find 1 error in the record at https://catalogue.odis.org/view/1105 Which is confirmed at: https://validator.schema.org/#url=https%3A%2F%2Fcatalogue.odis.org%2Fview%2F1105

Gleaner reports

 Error in unmarshaling json: invalid character ' ' in string escape code"
fils commented 3 days ago

So, I went ahead and actived the github action for the configuration builder for ODIS Cat.
After a for typos in the requirements.txt it seems to be working but there is an odd regression in YAML output. Need to check the version of python and the libraries installed in the action VM.

There also seems to be an odd error condition when the generated config file doesn't have any changes from the previous version. Reviewing this.

In the end, there are some items we use in the config file that are not currently in the ODIS Catalog properties.

Will build a list of thess for this issue.

fils commented 2 days ago

The yaml issue is resolved, code generates now.

Some observations:

Note:

Refs: