rega-cev / virulign

VIRULIGN: fast codon-correct alignment and annotation of viral genomes
GNU General Public License v2.0
29 stars 12 forks source link

Application on bacteria protein analysis #22

Open swantan opened 1 month ago

swantan commented 1 month ago

Hi, I am working on analyzing one of the proteins of Group A Streptococcus, the M protein. Given the diversity of M protein with a lots of insertions and deletions, I wonder if I can apply virulign to get a better alignment for M protein?

Any advice is greatly appreciated.

Thank you!! Swan

ktheyss commented 1 month ago

hi Swan,

i will check your question with the maintainer of the code.

best Kristof


Van: Swan Tan @.> Verzonden: donderdag 9 mei 2024 13:57 Aan: rega-cev/virulign @.> CC: Subscribed @.***> Onderwerp: [rega-cev/virulign] Application on bacteria protein analysis (Issue #22)

Hi, I am working on analyzing one of the proteins of Group A Streptococcus, the M protein. Given the diversity of M protein, I wonder if I can apply virulign to get a codon alignment for M protein?

Any advice is greatly appreciated.

Thank you!! Swan

— Reply to this email directly, view it on GitHubhttps://github.com/rega-cev/virulign/issues/22, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABKKZAGHRIPVIAPHIAGT6KTZBNQBJAVCNFSM6AAAAABHOV32GKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGI4DONJTGU3TCMI. You are receiving this because you are subscribed to this thread.Message ID: @.***>

plibin commented 1 month ago

Hi Swan,

I’m not an expert in bacterial protein analysis. Can you provide us with some details on the size of the proteins and possible polyproteins that make up this protein? It is just one protein you would like to focus on?

I think it would not be impossible to use virulign, but since we do all kind of things to solve frameshifts, maybe it is not the most efficient tool to use in the context of bacterial alignment.

Kind regards,

Pieter

prof. dr. Pieter Libin Assistant Professor Artificial Intelligence lab Vrije Universiteit Brussel @.**@.> http://ai.vub.ac.behttp://ai.vub.ac.be/

From: ktheyss @.> Date: Monday, 13 May 2024 at 15:26 To: rega-cev/virulign @.> Cc: Subscribed @.***> Subject: Re: [rega-cev/virulign] Application on bacteria protein analysis (Issue #22) hi Swan,

i will check your question with the maintainer of the code.

best Kristof


Van: Swan Tan @.> Verzonden: donderdag 9 mei 2024 13:57 Aan: rega-cev/virulign @.> CC: Subscribed @.***> Onderwerp: [rega-cev/virulign] Application on bacteria protein analysis (Issue #22)

Hi, I am working on analyzing one of the proteins of Group A Streptococcus, the M protein. Given the diversity of M protein, I wonder if I can apply virulign to get a codon alignment for M protein?

Any advice is greatly appreciated.

Thank you!! Swan

— Reply to this email directly, view it on GitHubhttps://github.com/rega-cev/virulign/issues/22, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABKKZAGHRIPVIAPHIAGT6KTZBNQBJAVCNFSM6AAAAABHOV32GKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGI4DONJTGU3TCMI. You are receiving this because you are subscribed to this thread.Message ID: @.***>

— Reply to this email directly, view it on GitHubhttps://github.com/rega-cev/virulign/issues/22#issuecomment-2107577582, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABJOXAR33PXOUQHGHI7EBQTZCC5RTAVCNFSM6AAAAABHOV32GKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBXGU3TONJYGI. You are receiving this because you are subscribed to this thread.Message ID: @.***>

swantan commented 1 month ago

Thank you Kristof @ktheyss for connecting me to the programmer.

Hi Pieter,

Thank you for responding to my question!! Very appreciate it. I actually just started learning bacterial protein analysis as well. The protein that I focus on is the M protein, which is the determination of the emm type for Group A Strep. It is about 500ish aa (~1500nt) in length. The issue is this protein is very diverse and has a lot of variation in terms of length due to insertions and deletions. I used the standard alignment tools but I don't get optimal alignment, largely due to the real insertions and deletions between strains. Hence, I thought of exploring further using virulign, which is a codon-based method.

Hope this explains, Swan

plibin commented 1 month ago

Dear Swan,

I believe virulign would indeed allow you to obtain a high-quality alignment. It might take some compute to obtain it, but you can definitely give it a try.

To support new pathogens, the easiest way is to create an XML file with annotation of the region of interest. You can find more info on that here: https://github.com/rega-cev/virulign-tutorial, more specifically in the section “Converting Genbank file to XML file”.

Let me know if you need assistance with that.

Kind regards,

Pieter

prof. dr. Pieter Libin Assistant Professor Artificial Intelligence lab Vrije Universiteit Brussel @.**@.> http://ai.vub.ac.behttp://ai.vub.ac.be/

From: Swan Tan @.> Date: Thursday, 16 May 2024 at 16:54 To: rega-cev/virulign @.> Cc: Pieter Libin @.>, Comment @.> Subject: Re: [rega-cev/virulign] Application on bacteria protein analysis (Issue #22)

Thank you Kristof @ktheysshttps://github.com/ktheyss for connecting me to the programmer.

Hi Pieter,

Thank you for responding to my question!! Very appreciate it. I actually just started learning bacterial protein analysis as well. The protein that I focus on is the M protein, which is the determination of the emm type for Group A Strep. It is about 500ish aa (~1500nt) in length. The issue is this protein is very diverse and has a lot of variation in terms of length due to insertions and deletions. I used the standard alignment tools but I don't get optimal alignment, largely due to the real insertions and deletions between strains. Hence, I thought of exploring further using virulign, which is a codon-based method.

Hope this explains, Swan

— Reply to this email directly, view it on GitHubhttps://github.com/rega-cev/virulign/issues/22#issuecomment-2115467470, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABJOXAX255ARUXDJRP5747DZCTCAVAVCNFSM6AAAAABHOV32GKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJVGQ3DONBXGA. You are receiving this because you commented.Message ID: @.***>

swantan commented 1 month ago

Hi Pieter,

Thank you for your input and resource! This is super helpful. I will definitely give it a try.

Thanks a lot, Swan