Closed andreas-wilm closed 9 months ago
Hi @andreas-wilm ,
We got the MHC I sequences from IMGT database: https://www.ebi.ac.uk/ipd/imgt/hla/download/
They also have a github repo for convenient downloading of their files: https://github.com/ANHIG/IMGTHLA
Regarding the specific gene you are referring to (HLA-A*23:01), you can find it here: https://raw.githubusercontent.com/ANHIG/IMGTHLA/3540/hla_prot.fasta
Best, Rui
Thanks @rui-yin!
You seem to have preprocessed the full sequences in that database to just extract the alpha 1 and 2 domains. Would you be able to share the exact process for reproducibility purposes? Did you use Pfam PF00129 / InterPro IPR011161 to achieve this?
Many thanks, Andreas
No problem, Andreas, happy to help! We extracted the alpha 1 and 2 domains of Class I MHC using a hidden Markov model built from a multiple sequence alignment containing alpha 1 and 2 domains of Class I MHC sequences. You can refer to the trim_mhc function in seq_utils.py to see how the processing is performed.
Best, Rui
Oh, it's all in ./scripts
! Wonderful. Thank you very much!
Hi TCRmodel2 developers,
Apologies for the slightly off-topic question: may I ask what the source of the MHC I sequences (e.g. GSH..TLQ for HLA-A*23:01) on the website is?
Many thanks, Andreas