mammaldiversity / mammaldiversity.github.io

(work in progress) Mammal Diversity Database website
MIT License
5 stars 9 forks source link

permalinks: https://www.mammaldiversity.org/explore.html?id=1004746 does not resolve, but https://www.mammaldiversity.org/explore.html#genus=Rhinolophus&species=sinicus&id=1004746 does, is that intentional? #24

Closed jhpoelen closed 3 months ago

jhpoelen commented 1 year ago

Hi!

While I was working towards integrating MDD into Nomer https://github.com/globalbioticinteractions/nomer/issues/141 , I noticed that:

https://www.mammaldiversity.org/explore.html?id=1004746

does not resolve, but

https://www.mammaldiversity.org/explore.html#genus=Rhinolophus&species=sinicus&id=1004746

does.

Is that intentional?

See also attached screenshots. Screenshot from 2023-01-23 13-41-10 Screenshot from 2023-01-23 13-39-55

n8upham commented 1 year ago

Hey @jhpoelen , yes that is correct -- we used to have the former ID-only links working and resolving, but then we had a discussion with Scott Loarie of iNaturalist and they inquired about adding links in the format of https://mammaldiversity.org/species-account.php?genus=[GENUS]&species=[SPECIES] in order to enable cross-linking with iNaturalist species pages.

We then tried for that, but didn't quite get there as I recall, due to an issue that @liphardt discovered so we landed on the https://www.mammaldiversity.org/explore.html#genus=[GENUS]&species=[SPECIES]&id=[ID] format.

Do you think that having ID only would be better / easier in some way?

jts1882 commented 1 year ago

The change has broken a lot of the links on Wikipedia.

Why have three parameters when one would do? There is no harm having the genus and species parameters in addition to the id (if that is helpful for iNaturalist), but the id should be sufficient to get the page required, as it was before.

The best solution would be to set it up so the links resolve when using id alone or genus+species. Also it would be helpful to be backwards compatible in allowing id or species-id.

liphardt commented 1 year ago

It was to try to accommodate iNat so they could like to the MDD. I ran in to an issue where only having genus + species in the permalink pretty much broke everything else in the MDD since most of the code was written utilizing the id.

jhpoelen commented 1 year ago

@n8upham @liphardt @jts1882 thanks for taking the time to respond.

In my experience, many taxonomic resources have a way to retrieve some kind of html landing page by id only. You'll find many example of these at:

https://github.com/globalbioticinteractions/globalbioticinteractions/blob/42c82ff03e15079cab2b38148963aa6eda9c7e7a/eol-globi-lib/src/test/java/org/eol/globi/util/ExternalIdUtilTest.java#L15

As far as iNaturalist goes - instead of getting them to use GloBI identifiers, they are using their own identifiers along with some naming convention to integrate (or point to) with GloBI.

I found that the iNaturalist development team have keep busy in keeping the site up and running, and need simple integration solution to help integrate with other systems. The benefit would be that you control which iNaturalist taxa are linked to what MDD entries.

https://github.com/globalbioticinteractions/globalbioticinteractions/issues/252

https://github.com/globalbioticinteractions/globalbioticinteractions/issues/668

https://github.com/inaturalist/inaturalist/blob/aa9e52d250378b3283c11f193c2017a19b53c7cf/tools/globi_observation_links.rb#L48

And, supporting a MDD landing page with id only would facilitate integrating MDD with GloBI, and perhaps also other resources like Wikidata etc.

jts1882 commented 1 year ago

The link with the id only used to work (as species-id), so that can be restored easily enough.

iNaturalist are currently using an old style MDD link: https://www.mammaldiversity.org/species-account.php?genus=Felis&species=chaus

Presumably they want to create a link with genus and species parameters. It should be simple enough to code something that, in the absence of an id parameter, parses through the data file until the line matching the genus and species and retrieving the id.

The current MDD version with genus+species+id doesn't help iNaturalist. If they have the id they can use that directly and if they don't have the id they need something for genus+species.

jts1882 commented 1 year ago

I've just add a bit of code to my fork, which takes either species-id OR genus and species:

https://jts1882.github.io/mdd/explore.html#species-id=1006023

https://jts1882.github.io/mdd/explore.html#genus=panthera&species=tigris

The code for the genus+species option is lines 238-258 in filter.js.

jhpoelen commented 1 year ago

@jts1882 wow that was fast! And the lookup by id renders quickly also.

Just curious is there a way to link directly to subspecies like:

Enhydra lutris nereis ?

as retrieved from nominalNames column in mdd.csv ?

jts1882 commented 1 year ago

What would you link to? The subspecies information is in nominal names section in the species infobox.

https://www.mammaldiversity.org/explore.html#genus=Enhydra&species=lutris&id=1005842

Or in the treeview version in my fork:

https://jts1882.github.io/mdd/tree.html#genus=Enhydra&species=lutris

jhpoelen commented 1 year ago

@jts1882 thanks again for your prompt reply.

What would you link to? The subspecies information is in nominal names section in the species infobox.

I guess I'd want to link to a specific nominal name in that list. Perhaps highlight the selection expressed in the url query syntax?

Just wondering, is it correct that all species in MDD have at least one nominal name? If so, does that mean that all mammal species have at least one subspecies.

jhpoelen commented 1 year ago

btw - that tree view looks pretty neat!

jhpoelen commented 1 year ago

@n8upham @jts1882 please let me know if you need any additional information to help address the issue on non-resolving permalinks including only taxon ids. (e.g. https://www.mammaldiversity.org/explore.html#id=1004444 does not resolve, but https://www.mammaldiversity.org/explore.html#genus=Acerodon&species=jubatus&id=1004444 does).

n8upham commented 1 year ago

Thanks for following up here @jhpoelen ! I do have a couple questions, also for @jts1882 : (1) When you say

supporting a MDD landing page with id only would facilitate integrating MDD with GloBI, and perhaps also other resources like Wikidata etc.

Does that mean that the MDD would then create a separate landing page that would have mapping between different taxonomies + the relevant hyperlinks? E.g., MDD-iNaturalist would be one mapping, MDD-GloBI could be another? Glad to look into developing this, and largely take your lead on the infrastructure if my team could then curate the name mappings.

(2) @jts1882 The tree-view on your fork is excellent! Could you help us implement that for the current taxonomy version (now v1.10) and indicate the steps needed to merge your fork with ours? The solution you have for an option between genus/species look-up versus ID is exactly what we are looking for. @liphardt may be able to help here too.

Thanks much all

jhpoelen commented 1 year ago

@n8upham thanks for prompt reply.

Does that mean that the MDD would then create a separate landing page that would have mapping between different taxonomies + the relevant hyperlinks? E.g., MDD-iNaturalist would be one mapping, MDD-GloBI could be another? Glad to look into developing this, and largely take your lead on the infrastructure if my team could then curate the name mappings.

In this context, I imagined use of MDD identifiers to help point to html pages on mammaldiversity.org webpages .

E.g.,

https://www.mammaldiversity.org/explore.html#id=1005842

would point to

The MDD landing page of Enhydra lutris .

This would enable GloBI and others to easily infer some clickable link from a MDD taxon id (e.g., MDD:1005842 ->https://www.mammaldiversity.org/explore.html#id=1005842 ) .

A separate topic would be to try and support external identifiers, like iNaturalist taxon ids. Maintaining this kind of mapping needs constant review and updates as taxonomy identifiers come and go, even if the IDs are marketed as "permanent." This is why I internally (in my quiet voice ; )) translate PIDs (permanent identifiers) into APIDs (aspirationally permanent identifiers).

And cross-domain concept/id mapping that is a separate topic in my mind. Glad to share ideas on that also if needed.

JelleZijlstra commented 1 year ago

I agree with @jhpoelen that links in the format he proposes would make it easier to integrate MDD links into other sources. For example, right now if I wanted to add cross-links to MDD to Hesperomys, I'd have to store the full link like https://www.mammaldiversity.org/explore.html#genus=Sorex&species=monticola&id=1004263, and that link would break if you ever decide to rename the species to Sorex monticolus. But if https://www.mammaldiversity.org/explore.html#id=1004263 worked and was guaranteed to continue resolving to the same species, I'd be able to store just 1004263 as an MDD identifier and I wouldn't have to worry about updating the link if MDD changes its spelling.

Also agree that the tree view in @jts1882's fork is beautiful; would be great if that can be incorporated into the main MDD website. I'm actually going to add similar UI elements to the next version of the Hesperomys frontend.

jts1882 commented 1 year ago

Re: the permalinks.

I don't understand how adding the genus and species to the permalink helps iNaturalist. As currently implemented, the genus and species part of the url don't do anything and the iNaturalist link is dead. The ID part is all important and the only bit needed, as long as its the fourth element of the post-hash part of the url split on "=".

My solution was to add a function to parse the url and exxtract all parameters. Then either id or genus+species links would work. Only the id one should be called a permalink, as the genus/species combination might change.

The parser function is parseURLforParameters() (line 302 of filter.js on my fork). It's called from a modified version of goPermalink(event) (line 258).

To implement this change the easiest thing to do is replace goPermalink(event) in the master with goPermalink(event) and parseURLforParameters() from my fork.

jts1882 commented 1 year ago

Implementing the treeview, is more difficult. My fork has many changes and you might not want all of them. Also I think some of my changes were merged with the master at some point, but later removed, so a merge now might not work

I think the best thing is for me to synchronise my fork with the master and then make the changes again in such a way to minimise changes elsewhere. Then any merges should be simpler.

jts1882 commented 1 year ago

I've synchronised my fork with the master and made additions to display the treeview page. This time I tried to avoid changing any of the existing files so nothing else changes. So at the moment the tree page works in isolation and isn't accessible from the other pages. Once we get it working the other pages can be edited to add it to the menu.

However, I had to make changes to the head.html and header.html pages (in the _includes folder) to get the page functioning with the right CSS for the header and menu. The reason for this is that the path for css, js and other assets uses a relative url starting with a slash, e.g. "/css/main.css", which starts at the root (i.e. "https://jts1882.github.io" in this case). My fork is in a subdirectory /mdd/ so the relative URLs don't work with the slash. Removing the slash fixes the links. I think this should still work without changing the behaviour of other pages in the master, but am not totally sure.

With the proviso above, merging my changes into the master should produce a functioning treeview page without changing anything else. As a first step we can try this.

For the record, the changes involve four additional files:

  1. A "tree.md" file for the new page.

  2. The "mammals.js" file with the treeview code

  3. A "tree.css" for the formatting.

  4. A "speciesinfo.js" file for displaying the information panel on the species. This is a modified extract of code from filter.js, which fixes a few of the formatting problems (e.g. overflows of long lines), italicises the nominal names, and shows the taxonomy with the changes made last year (i.e subclass, magnorder, etc). I made a new file to avoid changing filter.js.

As for going forward, I can update the filter.js to handle genus+species (for iNaturalis) or id only links (for backward compatibilty). However, I think it best leave this for now and go step by step.

n8upham commented 1 year ago

Hi @jts1882 I did the merge of your fork with the MDD master, but I'm not seeing the expected changes. I am seeing the "Treeview" tab in the Home page of the MDD now @ https://www.mammaldiversity.org/index.html (and screenshot), but then click on it the treeview doesn't render.

Are there any particular reasons you can see why this might be? Thanks again for your help here

Screenshot 2023-06-08 at 5 48 04 PM
n8upham commented 1 year ago

Okay @jts1882 I've made some changes to get this Treeview working now -- major thanks again for your help in assembling this!! The main thing I needed to do was remove those "/" relative links and then clean up some other remaining aspects of the mammals.js code (links to mammaldiversity.org instead of your fork, switched info icon symbol, added the MSW3 taxonomy file). It now seems to be working entirely as intended -- see attached screenshots and https://www.mammaldiversity.org/tree.html

Screenshot 2023-06-10 at 3 14 00 PM Screenshot 2023-06-10 at 3 19 22 PM

Thanks also to @jhpoelen for helping me learn more about the Github backend these last few days.

Still unresolved is the permalinks issue that is the topic of this Issue -- so I'll keep this open for now while we address that front.

jhpoelen commented 1 year ago

@jts1882 @n8upham nice! Y'all are getting fancy with those tree views!

jhpoelen commented 1 year ago

A little tip - if you reference the issue number (e.g., #24 or https://github.com/mammaldiversity/mammaldiversity.github.io/issues/24) as part of the commit message, then the changes will be automatically associated with the referenced issue.

jts1882 commented 1 year ago

Apologies for the slow response, but I see you've got the treeview working.

jts1882 commented 1 year ago

I've restored the changes to filter.js in my fork (see Jan 24 post) with my solution to fix the permalink issue, that accepts urls with either id OR genus + species.

Examples:

Id alone: https://jts1882.github.io/mdd/explore.html#id=1006023

Genus+species: https://jts1882.github.io/mdd/explore.html#genus=panthera&species=tigris

All three: https://jts1882.github.io/mdd/explore.html#genus=panthera&species=tigris&id=1006023

To list all the Felidae: https://jts1882.github.io/mdd/explore.html#search=Felidae

P.S. I'm still having issues with the relative urls and the root on my fork requiring an extra "/mdd/"). I had to change some links from "/js/filter.js" to "js/filter.js" in the .md files. However, the only changes to make the above work are in filter.js.

jhpoelen commented 1 year ago

@jts1882 Very cool to see that you've gotten creative with the permalink issues. Are you planning to prepare a pull request also? I think it'd be great to have this information integrated in the https://mammaldiversity.org site.

jts1882 commented 1 year ago

I'm cautious with the pull request as I'm still not sure about the urls relative to the root (see my PS above).

However, I think these show work in the mammaldiversity.org, so with that warning I'll try and make a pull request.

jhpoelen commented 8 months ago

In order to make it easier to keep (aspirationally) permanent ids around, I'd suggest the following:

  1. generate a stub page for each issued taxon id (this can be automated)
  2. for each stub page, jekyll automatically generates a taxon info page using a template (similar to approach suggested in https://github.com/mammaldiversity/mammaldiversity.github.io/issues/3)
  3. setup linking so that https://mammaldiversity.org/1004746 resolves to the landing page of mammal diversity taxon 1004746 . This taxon id currently points to a description Rhinolophus sinicus K. Andersen, 1905
n8upham commented 8 months ago

This makes a lot of sense to me -- the per species 'stub' pages will also help with linking to the associated content that we want to bring in, e.g., species range maps, MIL images, and taxonomic treatments. This will also set the stage for the 'Taxonomic Data Objects' idea we discussed in more detail today. Let's go forward here

jhpoelen commented 7 months ago

@n8upham @JelleZijlstra In working towards implementing taxon stub pages for improved managements of taxon ids and (aspirational) permanent link, I've done some code maintenance to help make the code a little easier to read. Also, I've introduced js/mdd.js which hold a static copy of mdd.csv in JSON form, as rendered by Jekyll. This way, you don't have to reload the mdd data from mdd.csv for rendering each and every search result.

With this maintenance, I think we are in a much better position to add new features, and . . . likely the site performance may be a little better. Let me know if anything is acting funky after my spring cleaning.

Next up - introducing page stubs.

JelleZijlstra commented 7 months ago

Thanks! It's inconsistent, but I sometimes get a 404 link to https://www.mammaldiversity.org/explore.htmlgenus=Zaglossus&species=attenboroughi&id=1000003 (notice missing ? after explore.html) if I go to "Search species", start typing "Zagl", and click one of the species. Sometimes it goes to this 404 link and sometimes it correctly shows the species account.

jhpoelen commented 7 months ago

@JelleZijlstra thanks for your prompt feedback! Opening a new issue.

How's the performance?

JelleZijlstra commented 7 months ago

Performance feels similar to before (reasonably fast on my faster work laptop, quite slow on my slower personal laptop). I haven't done a rigorous comparison though.

n8upham commented 7 months ago

I'm also getting the issue of no "?" being rendered between "html" and "genus" -- see below:

Screenshot 2024-02-16 at 10 29 13 AM
n8upham commented 7 months ago

And just clicking directly from the "Search Species" tab is yielding extra long links that don't resolve: https://www.mammaldiversity.org/explore.html?genus=Caenolestes&species=convelatus&id=1000008genus=Tachyglossus&species=aculeatus&id=1000002genus=Zaglossus&species=attenboroughi&id=1000003

JelleZijlstra commented 7 months ago

50 should fix my issue.

jhpoelen commented 6 months ago

@n8upham @JelleZijlstra am trying to resolve the permalink issue by allowing a shorter version, e.g.,

https://mammaldiversity.org/taxon/1004746

However, currently DNS configuration prevents this from happening.

Note screenshot below related to the current behavior of trying to load https://mammaldiversity.org.

related to #47 .

I think it is kinda important that multiple folks have admin access to the DNS entries of mammaldiversity , and most domain name companies offer a way to have a multi-user account to managing a domain. As far as I know, mammaldiversity domain is handled by "Tucows Domains Inc.". Any luck on contacting the owner of the domain, or are they awol?

image

image

n8upham commented 6 months ago

Thanks for this @jhpoelen -- I emailed to follow up with Sean. Hopefully that will help get you access to the DNS. FWIW, I'm currently able to access "mammaldiversity.org" as intended -- are you seeing that too?

jhpoelen commented 6 months ago

I notice that my browser is hiding the "www." prefix in the address bar.

I tried again without the prefix and got the same result, see attached screenshot.

image

jhpoelen commented 6 months ago

Also, on the command-line using "cURL"

$ curl -I https://mammaldiversity.org
curl: (60) SSL: no alternative certificate subject name matches target host name 'mammaldiversity.org'
More details here: https://curl.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.

whereas:

$ curl -I https://www.mammaldiversity.org
HTTP/2 200 
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Fri, 15 Mar 2024 03:42:18 GMT
access-control-allow-origin: *
etag: "65f3c39a-1525"
expires: Sat, 23 Mar 2024 00:15:11 GMT
cache-control: max-age=600
x-proxy-cache: MISS
x-github-request-id: 4AA4:2147D7:27E6CB:2EA979:65FE1CB6
accept-ranges: bytes
date: Sat, 23 Mar 2024 00:05:11 GMT
via: 1.1 varnish
age: 0
x-served-by: cache-msp11856-MSP
x-cache: MISS
x-cache-hits: 0
x-timer: S1711152312.918387,VS0,VE39
vary: Accept-Encoding
x-fastly-request-id: 7dd15cf55e1fb18843bb2e21c0695c61af37d7c6
content-length: 5413
jhpoelen commented 3 months ago

Note that now,

curl -I https://mammaldiversity.org
HTTP/2 301 
server: GitHub.com
content-type: text/html
location: https://www.mammaldiversity.org/

thanks for making this happen!

n8upham commented 3 months ago

Thanks for following up Jorrit! Yeah, this is awesome to have working