shandley / hecatomb

hecatomb is a virome analysis pipeline for analysis of Illumina sequence data
MIT License
54 stars 12 forks source link

Enhancement: Host #66

Open mihinduk opened 2 years ago

mihinduk commented 2 years ago

Hi Mike, It would be VERY helpful if Hecatomb could link host information to the taxonomy. I have attached a file with a list of viral families and hosts that could help with this.

Thank you, Kathie 2020_11_24_Viral_Family_host.xlsx

mihinduk commented 2 years ago

Here are some additional families (added to ICTV after the attached file was made). Family Host Adintoviridae Eukaryotes Aliusviridae Insects Crepuscuviridae Insects Curvulaviridae Fungi Guelinviridae Bacteria Kolmioviridae Vertebrates icluding humans Metaxyviridae Plants Myriaviridae Arthropods Natareviridae Arthropods Simuloviridae Haloarchaea Steitzviridae Bacteria Zobellviridae Bacteria

mihinduk commented 2 years ago

I was thinking that the column should be called "Known_hosts" to emphasize that additional hosts could be discovered

beardymcjohnface commented 2 years ago

This is an excellent idea. For now it would be a simple table-join to incorporate this during analysis, and we could add this to the tutorial.

beardymcjohnface commented 2 years ago

do we want the host information in a more controlled format? I'm looking at some of the lines and it's not particularly ideal:

e.g. Alphatetraviridae Insects: Butterflies, Moths Caulimoviridae Plants, Insects

Do we want to be able to group say all the insect-host viral families (which would be a problem with the current format)?

mihinduk commented 2 years ago

Hi Mike,

I think that would be fine. I just copied what ViralZone had. I would think that we would always want to include humans as their own category and not collapse to mammals if there are other mammals.

Kathie

From: Michael Roach @.> Date: Sunday, May 22, 2022 at 7:26 PM To: shandley/hecatomb @.> Cc: Mihindukulasuriya, Kathie @.>, Author @.> Subject: Re: [shandley/hecatomb] Enhancement: Host (Issue #66)

do we want the host information in a more controlled format? I'm looking at some of the lines and it's not particularly ideal:

e.g. Alphatetraviridae Insects: Butterflies, Moths Caulimoviridae Plants, Insects

Do we want to be able to group say all the insect-host viral families (which would be a problem with the current format)?

— Reply to this email directly, view it on GitHubhttps://github.com/shandley/hecatomb/issues/66#issuecomment-1134033165, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ANDVLDK6IK4OUDJC3YPCEK3VLLGEBANCNFSM5OY2CDBA. You are receiving this because you authored the thread.Message ID: @.***>


The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the intended recipient, be advised that any unauthorized use, disclosure, copying or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error, please immediately notify the sender via telephone or return mail.