physh-org / PhySH

PhySH (Physics Subject Headings) is a physics classification scheme developed by the American Physical Society to organize journal, meeting, and other content by topic.
Other
15 stars 1 forks source link

invalid SKOS #12

Closed cathydolbear closed 2 years ago

cathydolbear commented 5 years ago

The SKOS file will not load into my taxonomy manager tool (Pool Party) as there appear to be 150 orphan concepts. I validated the file using qSKOS https://qskos.poolparty.biz and got the following error report: report_physh.rdf_2019-03-11_16_32.txt

arthurpsmith commented 5 years ago

Hi @cathydolbear - thanks for the report and checking this out! Please note that the RDF files we provide are not strictly in the "SKOS" format - we have replaced the "top concept" relations with customized and more two-dimensional "discipline" and "facet" relations. However, there may be a way to make it compatible and it certainly would be nice to have it load cleanly into PoolParty (which we have also used for the original work on this). If you follow the URI links in your report for example for the "orphan concepts" you will see that they all resolve to concepts that are well connected into our discipline and facet organization. I wonder if you have a specific suggestion for how you think this ought to look, and perhaps we could work out something that makes sense for this?

cathydolbear commented 5 years ago

Hi Arthur, for us the simplest thing would be to use plain SKOS, for example making the Discipline a TopConcept of the APS Taxonomy, not a ConceptScheme. Facet would just be a skos:TopConcept as well. physh_rdf:excludeFromIndexing encodes website behaviour in the data, (breaks our data governance principles), so it's not something we'd be able to or want to use. usesFacet/inFacet could be skos:related and the other relationships could all be broader/narrower. What was the reason for using a custom vocabulary? I had a look at some of the APS journals, and couldn't see that information used anywhere.
If changing your taxonomy structure isn't an option, it might at least be easier to get it to load into Pool Party if the Custom Scheme was supplied separately? Thanks - and I appreciate all your work that's gone into this taxonomy :-)

arthurpsmith commented 5 years ago

Hmm, I'm not sure what you suggest is practical - it would make the next level in the hierarchies very confusing to view I think in any tool. I was thinking of creating artificial "top concepts" within the disciplinary schemes that are the facets specialized to that discipline. I suppose I could go ahead and make the facets top concepts of the main taxonomy, that would roughly correspond to how we present them without the disciplinary filters. Anyway, it'll take a little bit of time to explore this, but it does sound like you would find it useful. Would you mind clarifying where you are coming from on this? Do you have a pointer to your "data governance principles" for example?

arthurpsmith commented 5 years ago

@cathydolbear I have created a new branch in this repository, "disciplines_as_top_concepts", which has an added file "physh_discs_tc.ttl". You should be able to download this and load it into PoolParty. This is a version of PhySH where the disciplines and facets are all treated as top concepts of the main taxonomy as you suggested. I'm going to work on a SKOS-compatible version more like my other suggestion as well for comparison, but at least this should let you see it in the PoolParty tool.

cathydolbear commented 5 years ago

Thanks, that works for loading in Pool Party now. Are there cases where a child concept doesn't have the same Facets as their parent? I was looking at Bioacoustics, child of Acoustics and wondering why they were both immediate children of Research Areas, rather than Bioacoustics just implicitly being a kind of Research Area through inheritance from Acoustics. I'm experimenting with using PhySH for journal article auto-tagging - to basically be able to do the same as you have for the APS journals. I had another question about your choice of URIs (let me know if I should start a separate thread on this) - is Cross Ref resolving them for you? I can see say https://doi.org/10.29172/7cd48089-be71-4809-9288-ffdf82e55a20 resolving in my browser to the URL, but does CrossRef also resolve to the URI for you (as suggested in https://www.w3.org/TR/cooluris/#distinguishing)?

arthurpsmith commented 5 years ago

I think I indicated this format would be confusing. I'm working on a SKOS-compatible version that's more consistent with the way we see it, hopefully I'll have that ready by early next week. But basically the issue is that not all disciplines use all the concepts or facets - for example Bioacoustics is under Acoustics in Interdisciplinary Physics, but it is a top-level research area in Biological Physics which does not include Acoustics on its own. So in this more raw format it looks like it's there twice.

I'm not sure I understand your Crossref question - the DOI's are registered with Crossref, so yes they are part of the process of getting them resolved. Since these are conceptual and not documents in themselves, I'm not sure how the Cool URI's recommendation applies, can you clarify on that?

That's great that you're looking at using them for article tagging - we don't actually "auto-tag", all our tags are done by authors and/or editors.

arthurpsmith commented 5 years ago

@cathydolbear there is now a new branch - 'discipline_facets' - which has a different added file, 'physh_skos_compat.ttl'. Please try this one out, it replicates the layout of the PoolParty organization of the disciplines and facets that we had been using. Note that the top concepts under each discipline are "discipline-facet" pairs and are labeled as such. I hope this organization will look much more reasonable to you. Please let me know any further questions though!

cathydolbear commented 5 years ago

Thanks Arthur, it's loaded now!