jmcbroome / auto-pango-designation

Repository for automated flagging of new lineages for pango designation.
Other
5 stars 0 forks source link

ENH: Include fasta of the open (non-GISAID) sequences in your proposal for direct inspection #177

Open corneliusroemer opened 2 years ago

corneliusroemer commented 2 years ago

Would be great if you could upload open sequences from the proposal directly to the Github issue so that it can be easily checked in Nextclade & co (you may have to make the ending fasta.txt so Github accepts it. Alternatively you could put them in an AWS bucket or something similar for easier automation (uploading an linking in issue in automated fashion may be a bit annoying)

For the non-open, GISAID sequences, it would be cool if you could have a collapsible issue list like this:

<details>
  <summary>EPI ISLs</summary>

EPI_ISLs.....
</details>

This looks like this:

EPI ISLs EPI_ISL_1 ...
corneliusroemer commented 2 years ago

The advantage of a GISAID EPI_ISL list is that it can be copied and pasted into GISAID text field so one can get the sequences in about 10 seconds!

LAPIS has an API for this

EPI_ISL_15259635 EPI_ISL_14971339 EPI_ISL_15058581 EPI_ISL_15281174 EPI_ISL_15310537 EPI_ISL_15317025 EPI_ISL_15296238 EPI_ISL_15269672 EPI_ISL_15259634 EPI_ISL_15024980 EPI_ISL_15316879 EPI_ISL_15300781 EPI_ISL_15259739 EPI_ISL_15310370 EPI_ISL_15069931 EPI_ISL_15269658 EPI_ISL_15231879

jmcbroome commented 2 years ago

Thanks for the feedback- this is going to be a little bit more tricky than your other points. Unfortunately, Github does not allow the programmatic uploading of files to issues due to spam/content moderation concerns. A dropdown list like your example should be possible, but could potentially become extremely long for large clusters. I will revisit this but it may take a little longer to address.

corneliusroemer commented 2 years ago

Two simple workarounds:

  1. Not allowing file upload: You could upload them to some AWS bucket or similar file storage and share a link. That's just as good. One could even make that link open Nextclade directly :)
  2. Long list of EPI_ISLs: Randomize order and just input the first 200, problem solved. No need for complete list as this is more for inspection anyways.
chaoran-chen commented 1 year ago

You could also provide a link to LAPIS to fetch specific sequences from the open instance. You can either just write the accessions to the URL. Example:

https://lapis.cov-spectrum.org/open/v1/sample/fasta?genbankAccession=OM171465,OL499311,OQ107959

If you'd like to have more sequences and the URL gets to long, you can also send a POST request to the same endpoint with the accessions in the request body. Example:

{"genbankAccession": ["OM171465", "OL499311", "OQ107959"]}
jmcbroome commented 1 year ago

That is excellent! I wasn't aware of this API! I will investigate using this with our results- thanks