nf-core / airrflow

B-cell and T-cell Adaptive Immune Receptor Repertoire (AIRR) sequencing analysis pipeline using the Immcantation framework
https://nf-co.re/airrflow
MIT License
46 stars 32 forks source link

Isotype and c_call columns #303

Closed eba28 closed 4 months ago

eba28 commented 5 months ago

Description of feature

It would be great to have an Isotype column that generates from the c_call column e.g. IgA, IgD, IgE, IgG, IgM. The c_call column right now can return multiple genes like IgBLAST does, but for plotting it's nice to just have the overall family e.g. "IGHA2" instead of "IGHA2*01,IGHA2*02" . Perhaps there can be a new column (if you don't want to overwrite c_call) that uses alakazam::getFamily to get this information? This function includes options for keeping multiple calls.

ggabernet commented 5 months ago

Very good point @eba28, I worked on adding this on the enchantr package side, and I'll close this issue once the new version is added to airrflow

eba28 commented 5 months ago

The AIRR standard for c_call does say to include the allele: Rearrangement Schema. So perhaps there should be a new column for the family instead of updating the c_call column? Not sure if we ever check for this in the Immcantation pipeline.

ggabernet commented 4 months ago

Hi @eba28, this is added already in the dev branch of the pipeline. The isotype is added in the isotype column now, while the c_call is mantained intact. It will be available in the next release!