tdwg / sdd

Structured Descriptive Data (SDD)
https://www.tdwg.org/standards/116
7 stars 4 forks source link

steps to implement sdd.tdwg.org/doc/introduction #6

Open stanblum opened 1 year ago

stanblum commented 1 year ago

@timrobertson100, @MattBlissett, @baskaufs, @larsgw, @peterdesmet

I think there are a few changes we should make in this repository implement the sdd.tdwg.org site.

The TDWG SDS contains the following

2nd level: IRIs denoting standards documents

are in the form:

http://rs.tdwg.org/sss/doc/docname/

In the sdd repo settings I see that sdd.tdwg.org is now designated as the domain name for "pages", and that a "DNS check is in progress." I believe GBIF admins are managing the DNS for TDWG. Is that correct and is that addition to our DNS in progress?

The files comprising the SDD Introduction are now located in /sdd/docs/ and the "root" document name is SddContents.html. To make things consistent, does it make sense (or cleaner) to:

  1. move the files to /sdd/doc/introduction/,and
  2. rename the file SddContents.hml to index.html (and update all of the hrefs in the component files accordingly),

or am I not fully understanding what will be accomplished with redirection and its efficiency?

@larsgw, is that something you can do, or should I go ahead and take that on myself?

Thanks All!

larsgw commented 1 year ago

It's probably easier if I change the script to do those things. I can also make Markdown instead of HTML, that might help with layout and TDWG branding depending on the setup for other TDWG sites. Should I also convert the other SDD pages? They have examples among other things.

stanblum commented 1 year ago

I think I'd recommend keeping the output as HTML, just because markdown is so much more limited than HTML.

Examples could certainly be useful for people trying to understand SDD. I don't know what else might be available. I know the core participants in the development of SDD put in huge efforts and might have generated huge online discussions. I don't know how much of that is useful. I would begin with smaller resurrections. ;-) The Introduction is, of course, VERY valuable.

On Thu, Feb 9, 2023 at 12:08 AM Lars Willighagen @.***> wrote:

It's probably easier if I change the script to do those things. I can also make Markdown instead of HTML, that might help with layout and TDWG branding depending on the setup for other TDWG sites. Should I also convert the other SDD pages? They have examples among other things.

— Reply to this email directly, view it on GitHub https://github.com/tdwg/sdd/issues/6#issuecomment-1423788426, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACKZUDMTQH3JG57IVGB2LNDWWSQXNANCNFSM6AAAAAAUVZL4HI . You are receiving this because you authored the thread.Message ID: @.***>

MattBlissett commented 1 year ago

I believe GBIF admins are managing the DNS for TDWG. Is that correct and is that addition to our DNS in progress?

Yes, and it now shows as verified.

I don't think it's necessary to move files to /doc/introduction, but renaming the home page to index.html does make sense.

If you want to theme the site like https://dwc.tdwg.org you can:

I tried this, and it wouldn't be much more work to finish the job: https://mattblissett.github.io/sdd/ -- just Introduction and CodedData have invalid HTML in the tables (</td> without a <td> etc) and HTML within a Markdown document is much less forgiving than HTML for a browser.

MattBlissett commented 1 year ago

[Unless there existing links expecting /doc/introduction, in that case it does make sense to move them.]

larsgw commented 1 year ago

copy over _data, _config.yml, Gemfile, Gemfile.lock and favicon.ico from https://github.com/tdwg/dwc/tree/master/docs

Is it useful to specify metadata such as author, publication date, and version in the YAML header?

peterdesmet commented 1 year ago

Is it useful to specify metadata such as author, publication date, and version in the YAML header?

You could, but they won't be displayed on the website

baskaufs commented 1 year ago

Getting caught up on this.

TLDR:

There isn't necessarily any relationship required between the rs.tdwg.org "permanent IRI" of a standards document and the actual URL that delivers it, although it probably would be desirable for the URL structure to be analogous.

Details:

The main reason for the http://rs.tdwg.org/sss/doc/docname/ pattern is that the during IRI dereferencing, the server has to have some pattern to know how to handle resources that are documents vs. ones that are vocabulary terms, vocabularies, etc. If this pattern isn't followed (as in the case of some of the legacy DwC document IRIs), exceptions have to be coded specially in the server script.

Redirection from the rs.tdwg.org subdomain for documents is controlled by the entry in the browserRedirectUri column of this table. The actual redirect does not become active until publication in the master branch, followed by creation of a new release. You can see this happened with the November 4, 2022 release that updated the information about the previously lost XDF specification.

As I mentioned previously, this is part of a somewhat complex process since the metadata gets fed into various places to make the machine-readable metadata available. I have been working to document and streamline the process, and the documents part of it is recorded in this Jupyter notebook. For standalone documents, this is it, but for List of Terms documents, which result at the end of a vocabulary update process, it's just the last step with the earlier steps documented here.

When I have time (maybe this summer) I hope to make this simpler by using more pandas and YAML files. But overall, basically everything related to the documentation (human and machine-readable) update process lives in the process folder of the rs.tdwg.org repo.

For now, the easiest thing is probably to just tell me when the new URL is active and I can fix it.

stanblum commented 1 year ago

Hey Steve,

Thanks for coming back to me on that. I figured out some of that, and assumed that one or more scripts must be generating the output that will control the redirects from rs.tdwg.org. I'll spend some time looking at your Jupyter notebook, and see if I can figure out the piece that I'm missing... (The docs.csv file isn't an actual configuration file, is it? So what actually controls the redirects? I can see the "source", but where is the actual file of rewrite rules? Given that rs.tdwg.org is a GtitHub repo, is it a GitHub thing or a DNS thing?)

Most everything is in place, except for one thing. I still think it would be best to rename SddContents.html > index.html, and cascade that substitution all the way through the Introduction pages (files). If Lars doesn't get to that in the next couple days, I think I'll just do it myself, as that sort of global substitution is "within my reach" ;-) Then I'll give you the heads up that we can update the docs.csv and rerun the script(s). Lars might resurrect some example files. The next thing to agree on is where to post those.

Thanks again,

-Stan

(I hope your email tangles have been rectified!)

On Mon, Feb 13, 2023 at 1:49 PM Steve Baskauf @.***> wrote:

Getting caught up on this.

TLDR:

There isn't necessarily any relationship required between the rs.tdwg.org "permanent IRI" of a standards document and the actual URL that delivers it, although it probably would be desirable for the URL structure to be analogous.

Details:

The main reason for the http://rs.tdwg.org/sss/doc/docname/ pattern is that the during IRI dereferencing, the server has to have some pattern to know how to handle resources that are documents vs. ones that are vocabulary terms, vocabularies, etc. If this pattern isn't followed (as in the case of some of the legacy DwC document IRIs), exceptions have to be coded specially in the server script.

Redirection from the rs.tdwg.org subdomain for documents is controlled by the entry in the browserRedirectUri column of this table https://github.com/tdwg/rs.tdwg.org/blob/master/docs/docs.csv. The actual redirect does not become active until publication in the master branch, followed by creation of a new release https://github.com/tdwg/rs.tdwg.org/releases. You can see this happened with the November 4, 2022 release that updated the information about the previously lost XDF specification.

As I mentioned previously, this is part of a somewhat complex process since the metadata gets fed into various places to make the machine-readable metadata available. I have been working to document and streamline the process, and the documents part of it is recorded in this Jupyter notebook https://github.com/tdwg/rs.tdwg.org/blob/master/process/document_metadata_processing/tdwg_docs_workflow.ipynb. For standalone documents, this is it, but for List of Terms documents, which result at the end of a vocabulary update process, it's just the last step with the earlier steps documented here https://github.com/tdwg/rs.tdwg.org/blob/master/process/process-vocabulary.md .

When I have time (maybe this summer) I hope to make this simpler by using more pandas and YAML files. But overall, basically everything related to the documentation (human and machine-readable) update process lives in the process https://github.com/tdwg/rs.tdwg.org/tree/master/process folder of the rs.tdwg.org repo.

For now, the easiest thing is probably to just tell me when the new URL is active and I can fix it.

— Reply to this email directly, view it on GitHub https://github.com/tdwg/sdd/issues/6#issuecomment-1428736268, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACKZUDNOYOKCE22SVEDK7XLWXKUAJANCNFSM6AAAAAAUVZL4HI . You are receiving this because you authored the thread.Message ID: @.***>

larsgw commented 1 year ago

Lars might resurrect some example files. The next thing to agree on is where to post those.

I was thinking of adding twiki/pub/SDD instead of just twiki/pub/SDD/Primer but maybe that's too much.

larsgw commented 1 year ago

Authors in the YAML header do seem to work (but only for the machine-readable <meta> data): https://github.com/tdwg/petridish/blob/659d4de0a908d1548c6571f9dff20cf8f01fff45/_includes/head.html#L28-L35

larsgw commented 1 year ago

I've applied the tdwg/petridish theme and fixed the HTML, as well as added the examples (#7). A test version is live on https://larsgw.github.io/sdd.

peterdesmet commented 1 year ago

Oh right, I forgot that I pick up authors for metadata from pages too in Petridish. 👍

Site looks good, was wondering if examples.md is not better served from /examples/ rather than examples.html? Can be set by adding permalink: /examples/ to YAML.

larsgw commented 1 year ago

I'll work on that.

I noticed the table of contents isn't working on at least some pages. Do you know if that requires Markdown headings instead of HTML, or if there is something else going on?

peterdesmet commented 1 year ago

If Markdown, it will be rendered with Kramdown and headers will automatically get an id property. If HTML, you'll have to add your own id property to headings. The table of content will pick those up.

stanblum commented 1 year ago

The site ( https://larsgw.github.io/sdd) and examples look awesome. Thanks for doing all this!

Cheers,

-Stan

On Tue, Feb 14, 2023 at 8:59 AM Peter Desmet @.***> wrote:

If Markdown, it will be rendered with Kramdown and automatically get an id property. If HTML, you'll have to add your own id property. The table of content will pick those up.

— Reply to this email directly, view it on GitHub https://github.com/tdwg/sdd/issues/6#issuecomment-1430075701, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACKZUDI5XMK5BEPTWZ4CSJDWXO2YDANCNFSM6AAAAAAUVZL4HI . You are receiving this because you authored the thread.Message ID: @.***>

baskaufs commented 1 year ago

One comment on the redirection based on the docs.csv file. That file serves as a sort of official record of documents that are considered to be part of a standard. Technically, standards parts are determined by this table, but as a practical matter the docs.csv file is how the "standards-official" rs.tdwg.org IRIs get associated with the metadata about documents included in standards.

The significance of that is that anything on that list is "in the standard" and therefore subject to the standards change process and the SDS. They should all be listed on the landing page for the standard. Other docs, such as the Darwin Core Quick Reference Guide are not part of the standard and are therefore not governed by any process.

For the current standards that conform to the SDS, decisions about what documents are "in" and "out" of the standard were made as part of the standards creation or maintenance process. However, for the old standards, I (that is me personally) decided what documents I thought should be considered as part of the standard when I created the docs.csv file and the standards landing pages. I decided based on what we had lying around in repositories, how the documents describe themselves, and whether the documents were required as any kind of technical specification. Technical specifications were definitely in, examples were definitely out, and user guide-type stuff was questionable. In some cases, we have included introductions and user guides as part of the standards documents containing non-normative content (like the Audiovisual Core "guide" doc for example). But we are moving away from that somewhat for ease of maintenance and probably most user guides that aren't also technical specifications probably shouldn't be included in the standard.

So the question here is: which of the documents for SDD that we are resurrecting are actually going to be considered to be included in the standard, and which are going to be considered ancillary (outside of the standard). That will determine whether they get added to the docs.csv and standards landing page or not. If not, then we would want to create some links to them somewhere that is obvious so that people could actually find them. Actually, they could probably be listed on the standards landing page, but not in the "Included in the standard" section. They also would not be issued rs.tdwg.org IRIs.

larsgw commented 1 year ago

It looks like the primer/introduction has already been listed as "Part of the standard" for at least a few years. Are we talking about changing that?

(The only other pages that are currently being added is a new landing page, consisting of two links and the description from the standards landing page; and the examples, which should indeed not be part of the standard.)

baskaufs commented 1 year ago

No, I don't think we need to change that one if it's already there.