iodepo / odis-arch

Development of the Ocean Data and Information System (ODIS) architecture
https://book.oceaninfohub.org/
27 stars 16 forks source link

UNICODE issues #198

Open fils opened 1 year ago

fils commented 1 year ago

@jmckenna was curious if you could look at the file

4c133adacaf6193ff9dd559b4e6641c231519d69.json in the Africa IOC bucket.

Does this file ingest OK into Solr? I'm getting some issues with it

Error: Invalid Input Error: Malformed JSON in file "./africaioc/hold/4c133adacaf6193ff9dd559b4e6641c231519d69.json", at byte 561 in object 2: invalid UTF-8 encoding in string.

and I am wondering if this unicode issue is the one you are encountering with Solr. If so I may have a generic fix for this I can try out for us.

jmckenna commented 1 year ago

No issues bringing into Solr (unless I have the wrong file). Here is the indexed record in Solr (note that the Location/Place/LargeMarineEcosystem isn't yet handled by our regions/indexer code):


{
        "id":"https://ioc-africa.org/dbs/jsonld/upcomingExpeditions.php?id=4",
        "type":"Event",
        "txt_keywords":["Climate Variability and Predictability for Alexandria Coastal Zone "],
        "txt_url":["weblink1"],
        "name":"Climate Variability and Predictability for Alexandria Coastal Zone ",
        "dt_startDate":["2022-10-02T00:00:00Z"],
        "n_startYear":[2022.0],
        "dt_endDate":["2022-10-29T00:00:00Z"],
        "n_endYear":[2022.0],
        "txt_EventStatus":["EventScheduled"],
        "txt_potentialAction":["Research Cruise"],
        "txt_contributor":["James McKenzie"],
        "txt_relatedLink":["proj1",
          "proj2",
          "proj3"],
        "id_provider":["https://oceaninfohub.org/.well-known/org/africaioc"],
        "txt_provider":["IOC Africa Data Portal"],
        "keys":["id",
          "type",
          "txt_keywords",
          "txt_url",
          "name",
          "dt_startDate",
          "n_startYear",
          "dt_endDate",
          "n_endYear",
          "txt_EventStatus",
          "txt_potentialAction",
          "txt_contributor",
          "txt_relatedLink",
          "id_provider",
          "txt_provider"],
        "json_source":"{\"@context\": {\"@vocab\": \"https://schema.org/\"}, \"@type\": \"Event\", \"@id\": \"https://ioc-africa.org/dbs/jsonld/upcomingExpeditions.php?id=4\", \"keywords\": [\"Climate Variability and Predictability for Alexandria Coastal Zone \"], \"url\": \"weblink1\", \"name\": \"Climate Variability and Predictability for Alexandria Coastal Zone \", \"location\": [{\"@type\": \"Place\", \"@id\": \"https://marineregions.org/gazetteer.php?p=details&id=8539\", \"name\": \"East Bering Sea\", \"description\": \"Name of Large Marine Ecosystem region\"}, {\"@type\": \"Place\", \"description\": \"56E - 59E, 18S \\u0096 21S\"}], \"startDate\": \"2022-10-02\", \"endDate\": \"2022-10-29\", \"EventStatus\": \"EventScheduled\", \"potentialAction\": {\"@type\": \"Action\", \"name\": \"Research Cruise\", \"instrument\": \"5\"}, \"contributor\": {\"@type\": \"Person\", \"jobTitle\": \"Chief Scientist\", \"name\": \"James McKenzie\"}, \"relatedLink\": [\"proj1\", \"proj2\", \"proj3\"], \"prov:wasAttributedTo\": {\"@id\": \"https://oceaninfohub.org/.well-known/org/africaioc\", \"@type\": \"prov:Organization\", \"rdf:name\": \"IOC Africa Data Portal\", \"rdfs:seeAlso\": \"https://ioc-africa.org\"}}",
        "index_id":"4fe2e4b6-c077-4feb-a8e3-54ccedbfa48a",
        "_version_":"1761552666115178499",
        "indexed_ts":"2023-03-27T20:05:34.653Z"
}
pbuttigieg commented 1 month ago

Status? @jmckenna