milaboratory / mixcr

MiXCR is an ultimate software platform for analysis of Next-Generation Sequencing (NGS) data for immune profiling.
https://mixcr.com
Other
317 stars 78 forks source link

Export Clone/Alignment Command #132

Closed mtutert closed 7 years ago

mtutert commented 7 years ago

When I use the command for exporting clones or alignments (given by the document commands), I am left with an empty txt file as my output? However, my vjdca and .clns alignment and clones files are not empty and do contain relevant information.

Can you please help me troubleshoot this error?

Thanks, Marcus

dbolotin commented 7 years ago

Hi Marcus,

.vdjca and .clns files contain meta-information in their headers (~20Kb). So clns/vdjca files may actually be empty even if they have non-zero size.

Best, Dmitry.

mtutert commented 7 years ago

Hi Dmitry,

Thanks for the response. Can you help me trouble shoot why I am getting a empty file then?

I am using TCGA cancer data RNA-sequencing data. I sorted by readname and then changed the bam to a fastq file (paired format) . I then followed the instructions for "raw" RNA-seq repertoire analysis.

These were the commands I inputted

mixcr align --preset rna-seq -OallowPartialAlignments=true data_R1.fastq.gz data_R2.fastq.gz alignments.vdjca mixcr assemblePartial alignments.vdjca alignmentsRescued.vdjca mixcr assemble alignmentsRescued.vdjca clones.clns

But when I export these files, they are empty as I explained above.

dbolotin commented 7 years ago

Several questions:

  1. What is the length of the input data?
  2. Is it paired-end?
  3. Please paste report files after all steps:
mixcr align --preset rna-seq -r alignment_report.txt -OallowPartialAlignments=true data_R1.fastq.gz data_R2.fastq.gz alignments.vdjca
mixcr assemblePartial -r assemble_partial_report.txt alignments.vdjca alignmentsRescued.vdjca
mixcr assemble -r assemble_report.txt alignmentsRescued.vdjca clones.clns
  1. How many different datasets did you try, does the issue reproduce for all of them?
mtutert commented 7 years ago

1)This is WGS for human genomes that I am using. So the equivalent RNA length. 2)Yes it is. 3) When I ran that command mixcr align --preset rna-eq -r alignment_report.txt -OallowPartialAlignments=true PCAWG.f1b58fce-fb71-46bc-84f9-c641f4cdd2f5.STAR.v1.sorted.end1.fq PCAWG.f1b58fce-fb71-46bc-84f9-c641f4cdd2f5.STAR.v1.sorted.end2.fq alignments.vjdca

I actually get the error "Uknown option: --preset"

I am trying to do this on rna seq data and that is how the docs said to construct this command line though?

dbolotin commented 7 years ago

1) Sorry, once again. What is the length of reads in your dataset (in nucleotides)? Typically WGS contain very small fraction of IG/TCR sequences, so it looks ok to get empty output for such datasets. In my experience one can get ~1-5 VDJ alignments from 100M read 100+100 WGS dataset. So, unfortunately, it is a very bad source of IG/TCR repertoire information, and I would not recommend using it for this purpose. There is a very small chance to get some meaningful results.

3) Sorry, there was an error in docs. Correct command is:

mixcr align --parameters rna-seq -r alignment_report.txt -OallowPartialAlignments=true PCAWG.f1b58fce-fb71-46bc-84f9-c641f4cdd2f5.STAR.v1.sorted.end1.fq PCAWG.f1b58fce-fb71-46bc-84f9-c641f4cdd2f5.STAR.v1.sorted.end2.fq alignments.vjdca

Thanks for pointing on it!