postanalysis NullPointerException

Januaryyiyue commented 3 months ago

Hello,

We are using MiXCR/4.6.0 to run the postanalysis command. We got the following error:

   Version: 4.6.0; built=Sat Dec 09 14:48:42 EST 2023; rev=c9fafa41fe; lib=repseqio.v4.0
        OS: Linux
      Java: 18.0.1
  Cmd args: postanalysis individual --default-downsampling count-read-auto --default-weight-function read --metadata /path/to/metadata.csv --group sampleType /path/to/my-sample_01.clns /path/to/my-sample_01_result.json --only-productive --drop-outliers -OdiversityMeasures=diversity.observed,diversity.shannonWiener,diversity.chao1,diversity.normalizedShannonWienerIndex,diversity.inverseSimpsonIndex,diversity.giniIndex,diversity.d50,diversity.efronThisted
picocli.CommandLine$ExecutionException: Error while running command individual java.lang.NullPointerException
    at com.milaboratory.mixcr.cli.Main.registerExceptionHandlers$lambda-12(SourceFile:395)
    at picocli.CommandLine.execute(CommandLine.java:2088)
    at com.milaboratory.mixcr.cli.Main.main(SourceFile:101)
Caused by: java.lang.NullPointerException
    at com.milaboratory.mixcr.cli.postanalysis.CommandPa.groupSamples(SourceFile:286)
    at com.milaboratory.mixcr.cli.postanalysis.CommandPa.run1(SourceFile:308)
    at com.milaboratory.mixcr.cli.MiXCRCommandWithOutputs.run0(SourceFile:69)
    at com.milaboratory.mixcr.cli.MiXCRCommand.run(SourceFile:37)
    at picocli.CommandLine.executeUserObject(CommandLine.java:1939)
    at picocli.CommandLine.access$1300(CommandLine.java:145)
    at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2358)
    at picocli.CommandLine$RunLast.handle(CommandLine.java:2352)
    at picocli.CommandLine$RunLast.handle(CommandLine.java:2314)
    at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
    at picocli.CommandLine$RunLast.execute(CommandLine.java:2316)
    at com.milaboratory.mixcr.cli.Main.registerLogger$lambda-27(SourceFile:514)
    at picocli.CommandLine.execute(CommandLine.java:2078)
    ... 1 more

Here's the first few lines of the metadata.csv we use:

sample,sampleType
my-sample_01,P
my-sample_02,N

We would like to know where this error is coming from, and what we can do to solve it. Thank you.

Januaryyiyue commented 3 months ago

To elaborate, our .clns file name is structured like this:

sample-01-P-text_text.clns

And the matadata file is structured like this:

sample,sampleType
sample-01-P-text_text,P

mizraelson commented 3 months ago

Hi, First of all there is no need for -OdiversityMeasures=diversity.observed,diversity.shannonWiener,diversity.chao1,diversity.normalizedShannonWienerIndex,diversity.inverseSimpsonIndex,diversity.giniIndex,diversity.d50,diversity.efronThisted parameter. All metrics will be evaluated by default.

Regarding the error, it seems that in the --group sampleType parameter, you specified a column that is not present in the metadata. Could you please check that there are no extra spaces or other issues in the metadata column name and you refer to the correct file ?

DoryAbelman commented 3 months ago

Dear @mizraelson,

Thank you very much for your time and help with this, I really appreciate it. I am working with @Januaryyiyue on this issue. I removed the -OdiversityMeasures command and double checked the metadata file. Unfortunately I still received the same error. I am sure this column name exists in the metadata and am confident that there are no extra spaces or any issues with the metadata file after double checking. I wanted to please ask if there was any other potential thing which may be causing this issue. I'd also be happy to arrange a call to discuss further.

Here is a sample script:

#!/bin/bash
module load java/8
module load mixcr/4.6.0

java -Xmx28g -jar /cluster/tools/software/centos7/mixcr/4.6.0/mixcr.jar postanalysis individual --default-downsampling count-read-auto --default-weight-function read --metadata /cluster/projects/mixcr_postanalysis/metadata.csv --group sampleType /cluster/projects/mixcr_postanalysis/postanalysis/ALQ-02-012-T0-P-DNA-capTCR_S38.clns /cluster/projects/mixcr_postanalysis/postanalysis/ALQ-02-012-T0-P-DNA-capTCR_S38_result.json --only-productive --drop-outliers

The top of the metadata file:

sample,sampleType
ALQ-02-012-T0-P-DNA-capTCR_S38,P
ALQ-02-013-T0-P-DNA-capTCR_S26,P

The name of the corresponding clones file looks like this:

ALQ-02-012-T0-P-DNA-capTCR_S38.clns
ALQ-02-013-T0-P-DNA-capTCR_S26.clns

Below is the error:

By using this software, you agree the license at https://mixcr.readthedocs.io/en/develop/license.html

The following have been reloaded with a version change:
  1) java/8 => java/18

Please copy the following information along with the stacktrace:
   Version: 4.6.0; built=Sat Dec 09 14:48:42 EST 2023; rev=c9fafa41fe; lib=repseqio.v4.0
        OS: Linux
      Java: 18.0.1
  Cmd args: postanalysis individual --default-downsampling count-read-auto --default-weight-function read --metadata /cluster/projects/mixcr_postanalysis/metadata.csv --group sampleType /cluster/projects/mixcr_postanalysis/postanalysis/ALQ-02-012-T0-P-DNA-capTCR_S38.clns /cluster/projects/mixcr_postanalysis/postanalysis/ALQ-02-012-T0-P-DNA-capTCR_S38_result.json --only-productive --drop-outliers
picocli.CommandLine$ExecutionException: Error while running command individual java.lang.NullPointerException
    at com.milaboratory.mixcr.cli.Main.registerExceptionHandlers$lambda-12(SourceFile:395)
    at picocli.CommandLine.execute(CommandLine.java:2088)
    at com.milaboratory.mixcr.cli.Main.main(SourceFile:101)
Caused by: java.lang.NullPointerException
    at com.milaboratory.mixcr.cli.postanalysis.CommandPa.groupSamples(SourceFile:286)
    at com.milaboratory.mixcr.cli.postanalysis.CommandPa.run1(SourceFile:308)
    at com.milaboratory.mixcr.cli.MiXCRCommandWithOutputs.run0(SourceFile:69)
    at com.milaboratory.mixcr.cli.MiXCRCommand.run(SourceFile:37)
    at picocli.CommandLine.executeUserObject(CommandLine.java:1939)
    at picocli.CommandLine.access$1300(CommandLine.java:145)
    at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2358)
    at picocli.CommandLine$RunLast.handle(CommandLine.java:2352)
    at picocli.CommandLine$RunLast.handle(CommandLine.java:2314)
    at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
    at picocli.CommandLine$RunLast.execute(CommandLine.java:2316)
    at com.milaboratory.mixcr.cli.Main.registerLogger$lambda-27(SourceFile:514)
    at picocli.CommandLine.execute(CommandLine.java:2078)
    ... 1 more
App version: 4.6.0; built=Sat Dec 09 14:48:42 EST 2023; rev=c9fafa41fe; lib=repseqio.v4.0

I'd be very grateful for any assistance in solving this error. I wonder if it may have to do with the version of java we are using?

Thank you very much for your time and help with this, I really appreciate it and wish you the best,

Dory

mizraelson commented 3 months ago

Hi Dory, We have found the bug, it was due to the uppercase letters in the column name. Can you please either try running the latest develop version, or use --group sampletype instead.

DoryAbelman commented 3 months ago

Dear @mizraelson, This worked - thank you very much! I really appreciate your time and help with this. I was able to have the script run to completion by changing to --group sampletype and adjusting the header in the metadata file to have a lowercase T: sampletype

I just wanted to please confirm if postanalysis individual is the best way to downsample the clones across a directory. I have 100 .clns files and want them to be downsampled relative to other files based on sample type (ie all of one sample type downsampled relative to the read counts of other .clns files of that sample type). I can adapt the script to run in overlap mode if preferred.

Thank you very much for your time and help with this, I really appreciate it and wish you the best,

Dory

mizraelson commented 3 months ago

If you just want to downsample the data you can use the dedicated mixcr downsample command.

DoryAbelman commented 3 months ago

Dear @mizraelson,

Thank you for your message and assistance! I really appreciate it. I am interested in downsampling and then running mixcr postanalysis on the .clns files to compute statistical differences in diversity among them. Could you please confirm if it is best to first run mixcr downsample and then mixcr postanalysis individual on the output from mixcr downsample? Given that mixcr postanalysis individual only processes one .clns file at a time, I am unsure if the downsampling applied within that command can be accurately normalized relative to all other files in a directory or the metada.csv file.

Thank you very much for your time and help with this, I really appreciate it. Wishing you the best,

Dory

mizraelson commented 3 months ago

If you plan to run mixcr postanalysis individual there is no need to run mixcr downsample prior; you can use --default-downsampling count-read-auto parameter with mixcr postanalysis individual. mixcr postanalysis individual takes multiple .clns files as an input. You can check this link for reference. So the data will be downsampled across all files.

DoryAbelman commented 3 months ago

Thank you very much for your reply and sharing this with me, I really appreciate all your time and help with my requests. I will use the commands in the reference link. Best wishes.

milaboratory / mixcr

postanalysis NullPointerException #1654