milaboratory / mixcr

MiXCR is an ultimate software platform for analysis of Next-Generation Sequencing (NGS) data for immune profiling.
https://mixcr.com
Other
335 stars 79 forks source link

Han et al 2014 preset #1557

Closed alexell22 closed 9 months ago

alexell22 commented 9 months ago

Hi!

I am trying to analyse the data from the single sorted cells (compatible with the protocol Han et al 2024). However, I am getting Alignment failed: absent barcode: 3411672 (100%) when trying to run the preset. Is there a way to fix this issue?

mizraelson commented 9 months ago

Do you use the data set from the publication?

alexell22 commented 9 months ago

I managed to get the Han et al 2014 preset work on un-demultiplexed Undetermined_S0_L001_RN_001.fastq samples, but still got some uncertainties:

  1. We have 20*96 well plates coming from different samples and among the plate conditions are different. I am wondering if just mixcr analyze han-et-al-2014 --species hsa R1.fastq R2.fastq result would be enough to determine the precise location on the plate?
  2. I have also tried to add --sample-sheet samplesheet.tsv to have a more precise outcome, but I get Unmatched argument at index 8: 'output/' and the same with the result function (discussed in MIXCR Input file name expansion section).

the sample sheet is in tsv and has the structure SAMPLE CELL3PLATE CELL1ROW CELL2COLUMN sample1_A1 GCAGA TAAGC GTTCA sample2_B1 GCAGA TGCAC GTTCA sample3_C1 GCAGA CTCAG GTTCA sample4_D1 GCAGA GGAAT GTTCA ... (same structure for ~2000 rows)

mizraelson commented 9 months ago

Hi, so as I understand different plates are different samples? Then you can indeed use the --sample-sheet, but there is no need to specify all othe barcodes, just the one that defines a sample (CELL3PLATE). I attach an example of the samplesheet.

The command should be:

mixcr analyze han-et-al-2014 \
--sample-sheet han-sample-table.txt \
--species hsa \
input_R1.fastq.gz \
input_R2.fastq.gz \
output

han-sample-table.txt

That way you will have different output files for each sample. If you have multiple plates per sample just use the same barcode twice in the samplesheet.

Sincerely, Mark