psathyrella / partis

B- and T-cell receptor sequence annotation, simulation, clonal family and germline inference, and affinity prediction
GNU General Public License v3.0
57 stars 34 forks source link

Chicken sequence annotation #298

Closed nannabarnkob closed 4 years ago

nannabarnkob commented 4 years ago

Hi partis team

This might be a weird question, but is it possible to use partis to annotate chicken antibody sequences? The maturation process is quite different and is based on diversifying gene conversion from a set of pseudogenes that are utilized through recombination like in this figure:

Organization-of-the-chicken-immunoglobulin-light-chain-IgL-locus-The-germ-line-chicken

This does not really correspond to what partis is models the maturation process after, but maybe we're lucky. Partis is an integrated part of our pipeline and it would be a big help for us if we could also use it with our new chicken sequences.

Best Nanna

psathyrella commented 4 years ago

Thanks for your interest! I'm not very familiar with the chicken case, but we should be able to get it working. For mice and macaques, it was really just a matter of adding the appropriate germline set. I can't think of a reason why the fact that chickens use mostly gene conversion rather than rearrangement for diversification would mess this up. I guess the gene calls would correspond to the final, converted, gene rather than the original rearranged one, but that doesn't seem like a problem.

Although one thing would be that for mice and macaques we turn on clustering-based germline inference (i.e. looking for new genes that are very different, separated by more than just a couple SNPs from known genes) since the germlines are much less complete for mice and macaques than for humans. For chickens I'd imagine the germline database is also pretty incomplete, but I'm not sure how much the gene conversion would screw up the assumptions in germline inference.

As a first go, you'd just need to make a new germline directory modeled on the ones here using whatever germline database you have for chickens (IMGT? or something better?) and either point to it with --initial-germline-dir while having --species set to human/mice/macaque, or else add the chicken dir under chicken/ here and add chicken as an option for --species. I'd be happy to do the latter if you pass me lists of chicken germline genes, since it's nice to have partis working with as many species as possible.

nannabarnkob commented 4 years ago

Hi again

That sounds great! We need to get an overview of possible germline pseudogenes then, we think we have something in house. We can get that to you in a week or so if that is okay. The figure was from "Genetic Diversification by Somatic Gene Conversion" by Kohei Kurosawa and Kunihiro Ohta, Genes, 2011, DOI:10.3390/genes2010048. Maybe if there is something special to be aware of, you can check out their description of the process.

psathyrella commented 4 years ago

Any update on this?

nannabarnkob commented 4 years ago

Hi Duncan

We apologize for the radio silence! Things have been a bit busy here, but we are still very interested.

We have collected the set of pseudogenes in chicken, as well as functional V(D)J genes. Additionally, we have some chicken antibody sequences that we were able to annotate with our own tool that we could send for reference. Finally, we think it would be nice to contact you on our official Symphogen e-mail, if you would be interested in that.

psathyrella commented 4 years ago

Yep, that sounds great! dkralph at the gmail.

nannabarnkob commented 4 years ago

Great, we're happy to hear that! My boss will get back to you ASAP from our more "official" channels.

psathyrella commented 4 years ago

closing because after some more back and forth and running things it turns out i'm an idiot and chicken rearrangement works differently than i was thinking, so there is not really any prospect of partis working for chickens any time soon.

nannabarnkob commented 4 years ago

We highly appreciate the effort, thank you! We understand that partis is not really designed with chicken rearrangement in mind at all. With that said, what do you think of partis' current V-gene predictions as major donor fragment? We initially had a look at v_per_gene_support to get an idea of the confidence for one or more "fragments".

psathyrella commented 4 years ago

oh, yeah I think it should be fine for that. Just as long as you keep in mind that it's matching up the whole V gene, and any break points from conversion from other V genes will probably show up as shm indels.

nannabarnkob commented 4 years ago

All right. Thanks a lot! It is nevertheless a very useful step in our pipeline :-)