nextstrain / nextclade

Viral genome alignment, mutation calling, clade assignment, quality checks and phylogenetic placement
https://clades.nextstrain.org
MIT License
216 stars 58 forks source link

rendering a single/monolithic offlineable html file #283

Open dpark01 opened 3 years ago

dpark01 commented 3 years ago

Hi @ivan-aksamentov, question about nextclade (and as always, thanks for this great work)

For context, my questions are primarily around trying to figure out how to incorporate it into a workflow-based compute system, where people's data are analyzed asynchronously and results returned to them for examination. Which is why I asked earlier about being able to pre-compute the parts of nextclade that actually involve computing anything, and having the user only have to deal with rendering (ie, if the node/react webapp could just take the cli json output and just focus on rendering it) An even further ideal is exemplified by a tool I always point to, krona (https://github.com/marbl/Krona/wiki/KronaTools), which is a CLI tool that takes input data and produces a single, all-in-one html file (js and css all embedded in-line -- no references to anything on the internet). The user can then view that html file in their browser whether online or offline, and get the full interactive js UI with the visualizations--at no point is a server (web server, node web app) required. Makes it super simple to incorporate the tool into a file-based workflow compute environment.

Personally I don't know my node/react well enough to know how to contribute in such a way, but even if it's as dumb as writing a script that spins up nextclade via yarn in a docker image and then uses wget to push inputs and pull outputs to an html file, then collects all the referenced css/js assets and somehow embeds them within, that ought to do the trick in theory... googling around seems to imply that it might be even easier than that depending on what frameworks were used under the hood (https://stackoverflow.com/questions/51949719/is-there-a-way-to-build-a-react-app-in-a-single-html-file)

Anyway, this is more of a pre-exploratory question than asking for a specific feature request at the moment. I'm more curious based on your sense of it whether you think this is a significant amount of effort or whether it's possibly straightforward if someone had time and was motivated. Looking at the documentation on nextclade, which says "These sequences will then be analyzed in your browser -- data never leave your computer", this makes me think that this should be pretty feasible--there's no server-side processing happening currently, so we shouldn't need a server.. react code should be something that could be embedded in a static/offline html in theory.

Curious to know what you think!

ivan-aksamentov commented 3 years ago

Hi Danny,

Great question!

Nextclade is using Next.js framework (based on React) and, in particular, its Static HTML Export feature. The result of the build is a set of HTML files (for each page: main, results, tree and 404) and a set of resources they depend on: JS, CSS, images, fonts. These resources are split into chunks by webpack (underlying the build system of Next.js), for faster page loads.

If you follow the Developer's guide and then run yarn prod:build you should be able to obtain these files in packages/web/.build/production/web directory. For https://clades.nextstrain.org, we just grab all these completely dumb static files, put them on AWS S3 and host them from there to the internet (through Cloudfront cache).

So in a way, Nextclade is already the thing you've described: it's a static HTML page, except it's not fully self-contained - the resources are in separate files and are fragmented. In theory, if you download all these files (or build them locally), you should be able to just open index.html and everything will work (unless there are some hardcoded URLs or path, but that can be fixed).

Additionally, it should be possible to produce a Nextclade build where there is only one HTML file and all resources are inlined into it. I don't immediately know how it can be setup, but I think it's not very hard. Let's call it a "monolithic build".

Static and monolithic builds however don't address the data loading part. If you want to display only the results page, for some hardcoded results, then the results JSON should be somehow also built-in and inlined into the monolith at build time. I think it should be doable by setting up a separate build script (separate from yarn prod:build), which also imports the JSON.

This seems like a good idea, and this is very much in the spirit of the original, "simple" Nextclade we imagined in the beginning. But I am not sure though that I could pull this trick any time soon. Our priorities have shifted away from Nextclade recently. So if you or your folks have bandwidth, then definitely go ahead and try it. We'd be very happy to collaborate!