viralemergence / virion

The Global Virome in One Network
https://viralemergence.github.io/virion
37 stars 8 forks source link

VirusClass field sometimes contains erroneous host information #31

Closed eveskew closed 3 years ago

eveskew commented 3 years ago

In some cases, VirusClass data appears to erroneously report host, rather than virus, taxonomic information. These problematic non-virus VirusClass entries are currently restricted to data derived from GLOBI.

> library(tidyverse)
> library(vroom)
> 
> v <- vroom("Virion/Virion.csv.gz")
> 
> table(v$VirusClass)

      actinopteri     alsuviricetes 
               33             28891 
 amabiliviricetes     arfiviricetes 
                2             11801 
chrymotiviricetes  duplopiviricetes 
              352              2527 
   ellioviricetes    flasuviricetes 
            29756            220628 
   herviviricetes  howeltoviricetes 
            36966                16 
  insthoviricetes     magnoliopsida 
           818321                18 
   magsaviricetes          mammalia 
             1300                61 
    megaviricetes    monjiviricetes 
             5686            118667 
  papovaviricetes   pisoniviricetes 
            30882            256433 
  pokkesviricetes   quintoviricetes 
            14833             17900 
 repensiviricetes  resentoviricetes 
              396            101830 
  revtraviricetes   stelpaviricetes 
          1151358              9266 
 tectiliviricetes   tolucaviricetes 
            16653                 5 
> 
> v %>%
+   filter(VirusClass %in% c("actinopteri", "magnoliopsida", "mammalia")) %>%
+   pull(Database) %>%
+   unique()
[1] "GLOBI"
cjcarlson commented 3 years ago

image

God man, GLOBI