phiweger / zoo

A portable datastructure for rapid prototyping in (viral) bioinformatics (under development).
5 stars 2 forks source link

use case: segmented virus (linked record), cross-collection work: relation to flavivirus (sample, tree construction) #28

Open phiweger opened 7 years ago

phiweger commented 7 years ago

article: Qin, X.-C. et al. A tick-borne segmented RNA virus contains genome segments derived from unsegmented viral ancestors. PNAS 111, 6744–6749 (2014).

  1. quick search for each segment against SBT(C) for closest hit, we find flavivirus
  2. we download a collection of flaviviruses (originally created via data from VIPR database), this has been shared via the dat protocol
  3. some sequence subset to MSA, which we store in zoo in "gap notation"
  4. construct tree, new virus similar to flavivirus
  5. move collection to flavivirus collection as linked record
  6. query for this record, use link structure to find it
  7. attach electron scanning images through GridFS

The whole exercise will have taken us an hour max.

phiweger commented 7 years ago

Do an example query: loop over docs, and for all entries with a complete set of x segments, concatenate then with 100 nt "N" for usage with Mafft (linsi).

phiweger commented 7 years ago

More to integrate, now along lines of RdRp dataset (annotation), e.g. for Markus' HMM.

  1. Uncovering Earth’s virome : Nature : Nature Research. Available at: http://www.nature.com/nature/journal/v536/n7617/abs/nature19094.html. (Accessed: 14th March 2017)
  2. Redefining the invertebrate RNA virome.