milaboratory / mixcr

MiXCR is an ultimate software platform for analysis of Next-Generation Sequencing (NGS) data for immune profiling.
https://mixcr.com
Other
335 stars 79 forks source link

Still have question with MiXCR gene library #1756

Closed omegahh closed 3 months ago

omegahh commented 3 months ago

MiXCR v4.7.0 (built Thu Aug 08 03:19:48 CST 2024; rev=976ba14139; branch=no_branch; host=fv-az1019-185) RepSeq.IO v2.5.0 (rev=06fa1852ee) MiLib v3.5.0 (rev=b6cfcdc2af) Built-in V/D/J/C library: repseqio.v5.1

Library search path:

  • built-in libraries
  • /home/hongh/.

I export MiXCR library by the mixcr buildLibrary --species hs .... command. But when I parsing the output json file, I am confused with the 'alleleInfo' data.

Let me explain with some examples:

Example 1

As shown in below, the mixcr library tells me that "IGHV1-2*02" is an allele of "IGHV1-2*04" with mutation "ST198A". Am I right?

截屏2024-08-23 15 47 13

Then I check them in the IMGT Library, They are "CAGGGCT" and "CAGGGCA" as shown in below. This is consistent with the information above.

截屏2024-08-23 16 00 18

But when I go back to the Gene Library (https://vdj.online/library), The sequence remains "CAGGGCT", which is quite perplexing to me.

截屏2024-08-23 15 59 09

Example 2

When I test mutations with insertions and deletions, I found something more confusing.

Let's look at gene "IGHV3-30-5*06x“ and "IGHV3-30*18", which the latter is the allele parent of the former. And the mutation is "I132GDG135", which means an insertion of 'G' in position 132 and a deletion of 'G' in position 135. Shown in the following:

截屏2024-08-23 20 51 04

Based on the characteristics of the mutation mentioned above, I believe it should be "CGTG." However, the Gene Library still displays "CTGG", as shown in the following:

截屏2024-08-23 20 57 14

Furthermore, I download the fasta file for gene "IGHV3-30-506x“ and "IGHV3-3018", and compare these two sequences. The surprising thing is the different bases are totally different from those reported in the MiXCR library.

截屏2024-08-23 21 05 40 截屏2024-08-23 21 08 49

In conclusion

Because I need the detailed sequence information of the aligned genes for downstream analysis, I must know what kind of MiXCR sequence library I am comparing against. This includes all the alleles, as I used the findAlleles command, and the exported library is also in this format. Facing these inconsistence, What should I do? How to deal with the exported libraries? Is there any suggestions if I don't want to use the IMGT library?

mizraelson commented 3 months ago

Hi, I apologize for the inconvenience. Please refer to the current build in library, as the VDJ online database is outdated. We will update it soon to ensure consistency.

omegahh commented 3 months ago

Thank you for replying. So, I would like to confirm the inferred "CGTG" is correct, right? Because for this allele, I can't find it in the IMGT database.

mizraelson commented 3 months ago

So, I132GDG135 means:

This leads to:

IGHV3-30*18    AAGGGG-CTGGAG
IGHV3-30*5*06  AAGGGGGCTG-AG