milaboratory / mixcr

MiXCR is an ultimate software platform for analysis of Next-Generation Sequencing (NGS) data for immune profiling.
https://mixcr.com
Other
336 stars 79 forks source link

Postanalysis summary statistics not fully reproducible manually #1850

Open vincentwalter opened 2 weeks ago

vincentwalter commented 2 weeks ago

Expected Result

I expect the CDR3 metrics exported by postanalysis to be reproducible manually.

Actual Result

I can reproduce the length summary statistics but not the biochemical properties

Exact MiXCR commands

Using MiXCR v4.7.0 (built Wed Aug 07 21:19:48 CEST 2024; rev=976ba14139; branch=no_branch; host=fv-az1019-185) I manually exported like this:

mixcr exportClones -f \
  --chains IGH   \
  --filter-stops \
  --filter-out-of-frames \
  --drop-default-fields \
  -uniqueTagCount Molecule \
  -aaFeature CDR3 -aaLength CDR3 \
  -nFeature CDR3 -nLength CDR3 \
  -baseBiochemicalProperties CDR3 \
  example.clns \
  example_igh_manual.tsv

mixcr postanalysis individual \
  --default-downsampling none \
  --default-weight-function UMI \
  --only-productive \
  --chains IGH \
  example.clns \
  postanalysis/example.json.gz

And then compared the data in R version 4.4.2. I can reproduce the CDR3 lengths but not the biochemical properties.

manual <- read.delim("example_igh_manual.tsv")

postan <- read.delim("postanalysis/example.cdr3metrics.IGH.tsv")
#> Warning in read.table(file = file, header = header, sep = sep, quote = quote, :
#> incomplete final line found by readTableHeader on
#> 'postanalysis/example.cdr3metrics.IGH.tsv'

#----CDR3 Length----
weighted.mean(manual$nLengthCDR3, manual$uniqueMoleculeCount)
#> [1] 49.68081
postan$Length.of.CDR3..nt
#> [1] 49.68081

weighted.mean(manual$aaLengthCDR3, manual$uniqueMoleculeCount)
#> [1] 16.56027
postan$Length.of.CDR3..aa
#> [1] 16.56027

#----Biochemical Properties----
weighted.mean(manual$CDR3Charge, manual$uniqueMoleculeCount)
#> [1] -0.006800394
postan$Charge.of.CDR3
#> [1] -0.03862589

weighted.mean(manual$CDR3N2Volume, manual$uniqueMoleculeCount)
#> [1] 0.6432407
postan$N2Volume.of.CDR3
#> [1] 2.978124