serratus-bio / serratus.io

Front-end code for Serratus project website
https://serratus.io
GNU Affero General Public License v3.0
11 stars 11 forks source link

Palmprint/PalmID Integration into RdRP Explorer #158

Open ababaian opened 2 years ago

ababaian commented 2 years ago

Currently we have two main interfaces for Serratus data, a raw view focusing on sequence alignment in RdRP Explorer and a sequence-input view in PalmID. palmid pages currently link to the Serratus RdRP via sra_accession numbers when they are shown in a figure. This feature will add a corresponding input/link to palmid from the Explorer page.

Background

For each sequencing library in Explorer (i.e. ERR2756788), the virus markers within that sample have been identified in an "Assembly" file, which can be downloaded under the 'rdrp' button.

These rdrp sequences were processed by the palmscan algorithm to isolate the virus-barcode and thus we have defined the exact viral barcodes found in each sample (palm_id). The results of this analysis live in the SQL table palm_sra

SQL query:

select run_id, assembly_node, palm_id, percent_identity, evalue, coverage from palm_sra
where run_id = 'ERR2756788'
and qc_pass = true

returns:

run_id  assembly_node   palm_id,    id% evalue      coverage
"ERR2756788"    7   "u181855"   100 1.03e-76    2.34
"ERR2756788"    5   "u110490"   100 1.7e-70     5.26
"ERR2756788"    4   "u110497"   100 1.9e-74     3.53
"ERR2756788"    3   "u131844"   100 1.09e-90    15.90
"ERR2756788"    2   "u269097"   59  3.28e-70    4.84
"ERR2756788"    1   "u21405"    100 6.99e-102   26.32

Each of these palm_id sequences are found within the palmdb table (also in SQL) and thus the sequences can be retrieved from that table

SQL query:

select palm_id, nickname, palmprint from palmdb
where palm_id = 'u181855'

returns:

"u181855"   "weedlessDoghouse"  "VGIDASRFDAHVSIPILECEHAIYKKCYPGDSFLQKLCDLQLVNRGHTARGIKYKCPGGRMSGDMNTALGNCIIMLLVTAVAMANLGFQPKQWRMLCDGDDTLL"

Which ideally should be parsed to a fasta file:

>u181855_weedlessDoghouse
VGIDASRFDAHVSIPILECEHAIYKKCYPGDSFLQKLCDLQLVNRGHTARGIKYKCPGGRMSGDMNTALGNCIIMLLVTAVAMANLGFQPKQWRMLCDGDDTLL

Feature

Phase 1

In the RdRP explorer view, create a table view below the current raw alignment explorer which shows the palm_sra table results for a given SRA accession.

explore_palm

Phase 2

For each palm_id listed, create an "Analyze" button to the palmid page with the fasta file pre-filled OR if it is easier, with the submission already completed (i.e. press the Analyze Sequence button on palmID page).

For this example the submission link would be https://serratus.io/palmid?hash=376474b24f5eeddb57a1df9cbe3469b86a1a7663. So you would wrap the Analyze sequence button-press API.

![Uploading Screenshot from 2021-11-18 10-55-22.png…]()

Phase 3

For mouse-click of each palm_id sequence, display the fasta file file for that palmprint.