questions about output files (alignment & clonotype)

silvia1234567890 commented 1 year ago

Hello,

I have a few questions about the output files of the alignment (alignments.vdjca) and clonotype (clonotype.clns) with MiXCR. I take as input file for mixcr align a FASTA file with preprocessed sequences like this:

>AACCAGCAAATCACC_CONSCOUNT_134
ACAGCACGTCAGATTCAGCACAAA...
>ATACGCTATGCAACC_CONSCOUNT_119
ACAGCACGTCAGATTCAGCACAAA...
>GATAGCACTGGATGG_CONSCOUNT_96
ACAGCAGGTCAGATTCAGCACAAA...
>GGCTGAATTAACGAT_CONSCOUNT_93
GACAGCACGTCAGATTCAGCACAAA...

In the output.vdjca file (output file of mixcr align): 1) Is it possible to mantain the SeqID of the sequences I give in FASTA format? 2) How can I calculate the abundance of every sequence in the output.vdjca file?

Thank you in advance. Best regards, Silvia

mizraelson commented 1 year ago

Hi, Can you share the command you run?

silvia1234567890 commented 1 year ago

Yes, of course.

For the alignment:

mixcr align --preset generic-amplicon \
  --library Smaximus-IGH.json.gz \
  --species Scophthalmus_maximus \
  --rna \
  --rigid-left-alignment-boundary \
  --rigid-right-alignment-boundary C \
  seqs.fasta \
  alignments.vdjca

For the clonotype:

mixcr assemble alignments.vdjca clones.clns

And to export the .vdjca and .clns files to tsv format:

mixcr exportAlignments alignments.vdjca alignments.tsv
mixcr exportClones clones.clns clones.tsv

And I would like to know in the alignments.tsv if:

Is it possible to mantain the SeqID of the seqs.fasta?
How can I calculate the abundance of every sequence in the alignments output file?

Thank you in advance. Silvia

mizraelson commented 1 year ago

Hi,

1) If you add -OsaveOriginalReads=true parameter for mixcr align command and then add -descrsR1 for mixcr exportAlignments, a column that displays the original read header for each alignment.

2) Could you please clarify? Alignments correspond to individual reads, and as such, they do not reflect abundance. It is only after the mixcr assemble step, once all corrections have been made, that we can assemble sequences into clones and determine their relative abundances.

Sincerely, Mark

silvia1234567890 commented 1 year ago

Hi,

for the first issue: for mixcr align I ran this code:

mixcr align --preset generic-amplicon \
> --library Scophthalmus_maximus-IGH.json \
> --species Scophthalmus_maximus \
> --rna \
> --rigid-left-alignment-boundary \
> --rigid-right-alignment-boundary C \
> -OsaveOriginalReads=false \
> seqs.fasta \
> output.vdjca

and it gave me this warning:

WARNING: unnecessary override -OsaveOriginalReads=false with the same value.

but the rest is ok. But for mixcr exportAlignments, I ran this code:

mixcr exportAlignments -descrsR1 output.vdjca alignments.tsv

and I'm getting this error:

Exporting alignments: 0%
Please copy the following information along with the stacktrace:
   Version: 4.5.0; built=Fri Sep 22 14:39:05 CEST 2023; rev=cdb24b4fb7; lib=repseqio.v3.0.1
        OS: Mac OS X
      Java: 21
  Cmd args: exportAlignments -descrsR1 output.vdjca alignments.tsv
picocli.CommandLine$ExecutionException: Error while running command exportAlignments java.lang.IllegalArgumentException: Error for option '-descrR1':
No description available for read: either re-run align action with -OsaveOriginalReads option or don't use '-descrR1' in exportAlignments
    at com.milaboratory.mixcr.cli.Main.registerExceptionHandlers$lambda-12(SourceFile:340)
    at picocli.CommandLine.execute(CommandLine.java:2088)
    at com.milaboratory.mixcr.cli.Main.main(SourceFile:98)
Caused by: java.lang.IllegalArgumentException: Error for option '-descrR1':
No description available for read: either re-run align action with -OsaveOriginalReads option or don't use '-descrR1' in exportAlignments
    at com.milaboratory.o.pF.invoke(SourceFile:1098)
    at com.milaboratory.o.kW.a(SourceFile:25)
    at com.milaboratory.o.lv.put(SourceFile:40)
    at cc.redberry.pipe.CUtils.drain(CUtils.java:82)
    at cc.redberry.pipe.util.PipeExtensionsKt.drainToAndClose(PipeExtensions.kt:155)
    at com.milaboratory.mixcr.cli.CommandExportAlignments$Cmd.run1(SourceFile:166)
    at com.milaboratory.mixcr.cli.MiXCRCommandWithOutputs.run0(SourceFile:69)
    at com.milaboratory.mixcr.cli.MiXCRCommand.run(SourceFile:37)
    at picocli.CommandLine.executeUserObject(CommandLine.java:1939)
    at picocli.CommandLine.access$1300(CommandLine.java:145)
    at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2358)
    at picocli.CommandLine$RunLast.handle(CommandLine.java:2352)
    at picocli.CommandLine$RunLast.handle(CommandLine.java:2314)
    at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
    at picocli.CommandLine$RunLast.execute(CommandLine.java:2316)
    at com.milaboratory.mixcr.cli.Main.registerLogger$lambda-26(SourceFile:447)
    at picocli.CommandLine.execute(CommandLine.java:2078)
    ... 1 more

Thank you in advance. Silvia

mizraelson commented 1 year ago

Sorry about that. Please use '-OsaveOriginalReads=true' instead.

silvia1234567890 commented 1 year ago

It worked, thank you.

Regarding my second question, how can I determine the relative abundances after mixcr assemble step?

mizraelson commented 1 year ago

For each clone in the output clonotype table you should see its' read cound and frequency. The frequency shows the fraction of your sample occupied by the clone.

milaboratory / mixcr

questions about output files (alignment & clonotype) #1421