liulab-dfci / TRUST4

TCR and BCR assembly from RNA-seq data
MIT License
256 stars 46 forks source link

possible pseudogene alignment #284

Open marleysizzy opened 2 days ago

marleysizzy commented 2 days ago

Does your Trust4 reference file for the human genome include pseudogenes? If not, do you have any idea on how I could include those genes in?

mourisl commented 2 days ago

Yes, the reference file for the CDR3 annotation step is based on IMGT. It does include some pseudogenes, like IGHV1-68. But it might be incomplete. If you have more pseudogene want to add to the list, you can put them into the IMGT+C.fa file, and add the IMGT gaps. If you don't have the gap information, you can remove all the gaps ("." symbol) in the IMGT+C.fa file, and TRUST4 will use motif information to identify CDR3 instead of IMGT coordinate system. Hope this helps.