t3-innovation-network / desm

Data Ecosystem Schema Mapper
Apache License 2.0
10 stars 4 forks source link

Error importing ASN JSON file #247

Closed jbaird123 closed 2 years ago

jbaird123 commented 2 years ago

When importing the attached file, I get the following error:

Attention! Attention: Validation failed: Name can't be blank

ASN_D2695955.zip

jbaird123 commented 2 years ago

@stuartasutton - @excelsior noticed that there is another problem with the attached file: many properties there have duplicate names. Their IDs are unique, but the labels are not. Seeing a bunch of identical properties on the right may be confusing.

Do you have any concerns with this? Or suggestions for us?

stuartasutton commented 2 years ago

Can you give me an example, Joe, from the ASN JSON file. There shouldn't be any duplicate properties within the ASN namespace--i.e., unique IDs, but not the labels.

excelsior commented 2 years ago

Hey @stuartasutton, we've been testing with this file: ASN_D2695955.zip (it's an RDF/JSON file in a ZIP archive, because GitHub doesn't allow uploading JSON files).

The file contains 200+ terms of the http://purl.org/ASN/schema/core/Statement type. We use the http://purl.org/ASN/schema/core/statementLabel property as the source for names during import, but there are only four unique values in that file (Benchmark, Competency, Topic, and Topic Cluster).

stuartasutton commented 2 years ago

@excelsior & @jbaird123, the reason you are having no luck parsing this as an RDFS schema file is that it is NOT an RDFS Schema file. It is an RDF/JSON data file containing the Dublin Core's competency framework named "DCMI Competency Index for Linked Data".

Try the attached zipped JSON-LD schema file:

ASN_Profile.jsonld.zip

philbarker commented 2 years ago

@excelsior @jbaird123 could you point me to the set of test files you're using? I'll have look over them make sure that they are appropriate, not missing any important cases?

jbaird123 commented 2 years ago

Hi @philbarker - I have attached a zip file with all of the files I'm using to test. The ASN file is currently problematic for us, but @excelsior is working on that. Aside from that, all other files work with the importer.

files.zip

stuartasutton commented 2 years ago

@jbaird123, I am interested in knowing how the ASN file is "problematic"? I am the technical lead (and primary architect) of the ASN namespace.

jbaird123 commented 2 years ago

@stuartasutton - I didn't mean that there are problems with the file. We're having problems importing it, but we haven't been able to investigate it yet. It's likely an issue with our code. I'll let you know if it turns out otherwise.

stuartasutton commented 2 years ago

Super. Thank you.

excelsior commented 2 years ago

@stuartasutton The ASN_Profile.jsonld.zip file does have a problem after all. Its @context property (https://github.com/stuartasutton/asn/blob/master/asnContext.json) links to a GitHub page which responds with an HTML document. The raw version of that file (https://raw.githubusercontent.com/stuartasutton/asn/master/asnContext.json) does return a JSON document, but with a text/plain content type. We expect a JSON document with a proper MIME type when resolving a spec's context.

I'm not sure if there is a way to force GitHub to serve files with correct response headers. Do you think we should add a workaround if you plan to store contexts on GH?

jbaird123 commented 2 years ago

@excelsior - I'm not sure that's the only issue. I modified my local version to point the context to the raw content, and I also changed the id reference to point to the raw content as well. I still get the "Attention: Couldn't find MergedFile with 'id'=undefined" error message when I attempt to upload. I have attached the file with my modifications for your reference.

ASN_Profile-joe.zip .

excelsior commented 2 years ago

@jbaird123 The raw version still has the wrong MIME type (it's text/plain, but should be application/json), i.e. it's the same issue. The question is, should we try to parse it as JSON anyways, disregarding the content type.

jbaird123 commented 2 years ago

@excelsior - Since we're parsing JSON-LD, I'm ok with ignoring the MIME type and parsing all files as application/json. @stuartasutton - Let us know if you have any issues with that.

stuartasutton commented 2 years ago

I can change the file addressed, but I've no idea about MIME type. Seems strange since there are lots of projects that have their JSON-LD files and context files in github. Best to ask Phil. Since these files are on my github account, I can make any changes necessary.

philbarker commented 2 years ago

I could put the file somewhere temporarily and serve it with the right mime type, but that would be an ad hoc fix for a single case. I suspect serving these files with the wrong MIME type is quite common, so I would suggest processing it as JSON-LD even though its MIME type if plain text. You could raise an issue about content sniffing for MIME type as a generic fall back before processing an unexpected MIME type, but I suspect if you hit a case where what you're processing really isn't JSON-LD you'll find out soon enough.

So yeah, just process remote contexts on the assumption that they are JSON-LD. Maybe log a message/warning about the wrong MIME type if that's an option, as it might help with fault finding if that doesn't work.

philbarker commented 2 years ago

hmmm, should I be worried that there isn't a tast-case schema for IMS CASE in the zip archive?

There is a JSON-Schema file for CASE as JSON-LD here: https://github.com/philbarker/desmSchemas/tree/main/IMS%20Global (case-ld.json) along with some other files that may be useful.

stuartasutton commented 2 years ago

Phil, per your comment https://github.com/t3-innovation-network/desm/issues/247#issuecomment-1212196145, I think it is one thing to know where to gather these files from the canonical source and another to gather together in an easily accessible place some test files. The CASE files are present as well as others for 1EdTech in my github.