legumeinfo / jekyll-legumeinfo

legumeinfo.org built with Jekyll and served on GitHub Pages
Apache License 2.0
4 stars 1 forks source link

Gene search with pangenes #158

Closed adf-ncgr closed 9 months ago

adf-ncgr commented 9 months ago

user facing changes to gene search are limited to the addition of a pangeneset column and linkouts for gene families and pangenesets. The rest of the changes should all have to do with updates to make use of graphql/webcomponent shimming functionality through the theme. Hope it passes review the first time, but not holding my breath!

adf-ncgr commented 9 months ago

just noticed an issue seemingly caused by deleting the lines below from _config.yml as I thought @alancleary suggested (and which did solve the error I was seeing with html headers being added to javascript files); now some of the pages (like the home page) are missing the header/styling altogether. Did I misinterpret your suggestion @alancleary or is there something else going on here?

  -
    scope:
      path: ""
    values:
      layout: "default"
adf-ncgr commented 9 months ago

just adding an explicit layout: default to the impacted index files fixes it. I'll just go with that unless there's some more elegant solution

alancleary commented 9 months ago

just noticed an issue seemingly caused by deleting the lines below from _config.yml as I thought @alancleary suggested (and which did solve the error I was seeing with html headers being added to javascript files); now some of the pages (like the home page) are missing the header/styling altogether. Did I misinterpret your suggestion @alancleary or is there something else going on here?

  -
    scope:
      path: ""
    values:
      layout: "default"

Hmm. It should use the default layout by default... I'll look into it. For your workaround, be sure to use the "home" layout for the homepage!

alancleary commented 9 months ago

@adf-ncgr It turns out as of Jekyll 4.2 the default.html layout is no longer used by default, although creating a default.html layout as the layout that should be used by default is still the convention... So the defaults entry @sammyjava had in the _config.yml file was actually appropriate if you don't want to specify the default layout in the front matter of a bunch of pages. However, you can optionally specify a type to limit it to just pages:

defaults:
  -
    scope:
      path: ""
      type: "pages"
    values:
      layout: "default"

This seems to prevent the default layout from being added to any assets files EXCEPT the assets/js/graphql/query.js file in the theme, which has empty triple-dashed lines to ensure that the liquid tag that adds the GraphQL URI from _config.yml is processed. Removing these lines prevents the file from being process altogether, i.e. the liquid tag shows up in the built site verbatim.

Fortunately you can explicitly specify the layout of a file as none and this won't be overridden by the defaults in the _config.yml file. I've confirmed that this fixes the issue with the assets/js/graphql/query.js file so I'll push it as a hotfix in the theme. I'll post an update here when it's done!

alancleary commented 9 months ago

@adf-ncgr The hot fix has been merged into the main branch of the theme. Using that branch with the defaults config I specified above should have the desired effect.

adf-ncgr commented 9 months ago

OK, thanks, this should simplify things a bit. Will give it a shot...

alancleary commented 9 months ago

For posterity: Looking at the Jekyll source code, I see that only SASS and CoffeeScript assets are considered "assets" file types; any other text file in the assets directory will be treated as a "pages" file type. So I guess the default layout being applied to our JS files even when we limit it to files of type "pages" is the expected behavior...

sammyjava commented 9 months ago

FYI I tossed #156 at @alancleary for this same reason. I lost styling with the loss of the default entry as well as losing the functionality of the gene search trying to use the npmjs web-components load. But I expect that's behind the updates you've done, so that issue is probably moot now.

adf-ncgr commented 9 months ago

@alancleary this seems to work as you've described; just wondering now before I commit the changes here if you think putting the onus on "assets" developers to specify a "none layout" is better than requiring page developers to specify a "default layout"? Feels a bit wrong to me, but I probably don't have a reasonable moral compass for such things...

adf-ncgr commented 9 months ago

@sammyjava, yes I think we've addressed the issues you raised in #156, but you can be the judge of it once you review this pull request (which I'm imagining will happen via the dev site, so others can have a look at the changes before we release the updates that include the genome browser links)

sammyjava commented 9 months ago

Wait - are there supposed to be pan-gene set linkouts? The linkouts are empty on the example above. Kinda useless without 'em.

image

The other three linkout links do provide linkouts.

alancleary commented 9 months ago

@alancleary this seems to work as you've described; just wondering now before I commit the changes here if you think putting the onus on "assets" developers to specify a "none layout" is better than requiring page developers to specify a "default layout"? Feels a bit wrong to me, but I probably don't have a reasonable moral compass for such things...

At least for the theme, I think leaving it up to the assets developers is fine. Only files that use the triple-dashed line front matter will need to specify the none layout, which is currently just the query.js file. Also, the theme should be programmed "defensively" anyway since it doesn't have control over the environments it's used in.

alancleary commented 9 months ago

Works for me: https://jekyll.dev.lis.ncgr.org/tools/search/gene.html?page=1&genus=&species=&strain=&identifier=Phvul.005G168200&description=&family=

But were we supposed to add a search parameter for pangene set ID just like gene family ID? I don't care, but since we display it, it would be typical to allow that. The searchGenes method does provide that.

Based on our previous conversations about pangene set search, I think this should definitely be added. There's no need to let it block this PR, though.

adf-ncgr commented 9 months ago

Wait - are there supposed to be pan-gene set linkouts? The linkouts are empty on the example above. Kinda useless without 'em.

Right now I only added a LINKOUTS yml file for soybean pangene sets, but that was just me being lazy. Also wasn't entirely sure what @StevenCannon-USDA had planned for updating the pangene sets, but I guess anything that's currently loaded into a mine should have them added. So I will do that... (but it isn't an update to this PR, it will be content updates in datastore-metadata)

StevenCannon-USDA commented 9 months ago

I am recalculating the pangene sets now, and then families (with trees, alignments, etc.). We should have a shiny new set of them by next week.

sammyjava commented 9 months ago

I am recalculating the pangene sets now, and then families (with trees, alignments, etc.). We should have a shiny new set of them by next week.

Good stuff, but remember it takes a few weeks to load them into the mines, particularly GlycineMine and LegumeMine.

adf-ncgr commented 9 months ago

OK, I added the linkouts for the other pangenes and made the updates for using the config to specify the default layout instead of having it be explicit in all pages. @sammyjava do we want to get this "staged" before I email the group about it, or should I just point them to your dev instance? I guess there's also the question of whether we want a news item; defer to you @StevenCannon-USDA, since maybe it makes more sense to wait until the new pangenesets are available to call attention to them? There are also the few new genomes recently added but I think those are not yet loaded into the mines (plus a few other checkboxes remaining to be ticked). But I think adding individual new genomes to legumemine should be faster than the few weeks for a complete rebuild, right @sammyjava?

sammyjava commented 9 months ago

But I think adding individual new genomes to legumemine should be faster than the few weeks for a complete rebuild, right @sammyjava?

I wasn't talking about a complete rebuild. Loading pan-gene sets takes a long time because of merging the transcripts, and then the post-processor which associates proteins and genes with pan-gene sets can take quite a while as well. The bigger the mine, the slower those sorts of loads because of the merge overhead.

Loading a new genome is fairly quick since it's new features, and there isn't much merging. It's cross-accession stuff like pan-gene sets and gene families that have a lot of merge overhead and can take a very long time.

sammyjava commented 9 months ago

FYI I pulled the most recent updates from this branch so the dev site reflects all changes.

sammyjava commented 9 months ago

@sammyjava do we want to get this "staged" before I email the group about it, or should I just point them to your dev instance?

In principle the staging site should very rarely have updates before promotion to production - it's meant for last-minute error checks. Any change made to stage should be considered the result of a mistake. If you want feedback about anything, it should be viewed on dev.

sammyjava commented 9 months ago

But, of course, once you merge this into main, it goes to the staging site. So that's sort of the mechanical difference -- if a PR is totally "ready to go" it is merged to main and the staging site reflects the update. So if you want folks to review what you've done, you don't merge this PR until that review is over.

adf-ncgr commented 9 months ago

thanks, sounds good (and @alancleary will be so proud you are getting persnickety about commented code smells); I'll point people to dev and wait to merge this until I hear an all-good from the legume gallery (less exclusive than the peanut one)

adf-ncgr commented 9 months ago

seems there are no glaring issues noted, so going ahead with the merge. we may do a news item later but it will be nice to get these updates actually in place.

adf-ncgr commented 9 months ago

@sammyjava I am not entirely sure if https://jekyll-stage.legumeinfo.org/tools/search/gene.html is now being run off of the results of the merge of this PR (no pangeneset column is in evidence), but it also seems it is now getting the: query.js:1 Uncaught SyntaxError: Unexpected token '<' (at query.js:1:1) that was supposed to be resolved by adding layout: none to the theme. I suspect I'm misunderstanding something about the github workflows this set in motion, but they seem to have completed so I'm at a bit of a loss as to what's gone wrong between the dev site and stage. Any thoughts?

sammyjava commented 9 months ago

Correct, merging to main triggers the GitHub action, I mean workflow, to update the staging site. After that, I email Nathan, as I had nothing to do with that part of the process. Could be versionitis, we hit that with Ruby last week. @nathanweeks this may need your attention if not @alancleary .

The main branch seems to be running fine on jekyll.dev so presumably there is something going on with the Jekyll version or something in the GitHub workflow.

sammyjava commented 9 months ago

Oh, the GitHub workflow finished successfully:

https://github.com/legumeinfo/jekyll-legumeinfo/actions/runs/7718546396/job/21040065205

But the site is even failing to load data for the gene search. So something is seriously amiss. It is running the main branch, presumably.

adf-ncgr commented 9 months ago

Yeah, I would have assumed it would run the main branch too, but I see now in the workflow that it isn't using docker as I had assumed it was so maybe the ruby version conjecture is the correct one. I'll look at it a bit more when I get a chance (he says, hoping confidentally that @nathanweeks will have solved it long before that happens...)

alancleary commented 9 months ago

@sammyjava @adf-ncgr The theme submodule isn't up to date. Specifically, it's pointing at commit cbb44ac, which predates the layout: none hotfix. It should be updated to commit 3a4902e.

sammyjava commented 9 months ago

Well, jekyll.dev isn't using Docker, either....

sammyjava commented 9 months ago

@sammyjava @adf-ncgr The theme submodule isn't up to date. Specifically, it's pointing at commit cbb44ac, which predates the layout: none hotfix. It should be updated to commit 3a4902e.

Ahhh, I manually pulled it on dev. And I now see that, yes, that's a new commit w.r.t. jekyll-legumeinfo:

    modified:   _themes/jekyll-theme-legumeinfo (new commits)

@adf-ncgr you did the PR so you get to update the theme within the site and punch it up again.

adf-ncgr commented 9 months ago

Oh, I forgot that submodules get pinned to a commit, I was thinking it would just effectively get HEAD on main of both. I believe @alancleary had schooled me already once about this. So, yes, it's on me here...

adf-ncgr commented 9 months ago

I guess it's somewhat better now, but both it and dev (and prod!) seem to be running into the graphql server not responding the the requests for form data {message: '502: Proxy Error', locations: Array(1), path: Array(1), extensions: {…}} extensions : code : "INTERNAL_SERVER_ERROR" response : {url: 'https://mines.legumeinfo.org/legumemine/service/qu…3C%2Fquery%3E&summaryPath=Organism.id&format=json', status: 502, statusText: 'Proxy Error', body: '<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">…om remote server

\n\n'} [[Prototype]] : Object

back at you. @sammyjava ??