milaboratory / mixcr

MiXCR is an ultimate software platform for analysis of Next-Generation Sequencing (NGS) data for immune profiling.
https://mixcr.com
Other
314 stars 78 forks source link

Pacbio full-length TCR sequence #638

Closed pigraul closed 2 years ago

pigraul commented 3 years ago

Hello,

Is Mixcr suitable for pacbio data analysis. Can I skip assemble and only for clonetyping?

PoslavskySV commented 3 years ago

@pigraul Hi,

sorry for delayed reply. This is not trivial, we need to check and probably adjust some MiXCR parameters for such kind of data. Could you please share your data with us? You can contact directly via support@milaboratory.com

y1zhou commented 1 year ago

Sorry for bumping this old issue, but are there any followups to the question that can be publicly shared? We are also trying to use MixCR for analyzing PacBio data. Thanks!

mehdiborji commented 1 year ago

Sorry for bumping this old issue, but are there any followups to the question that can be publicly shared? We are also trying to use MixCR for analyzing PacBio data. Thanks!

I have successfully used both PacBio and Nanopore with MiXCR and needs little to no adjustments and works amazingly!

CCS PacBio data needs to correction at all!

For high-indel nanopore data you may need to increase mutation probably to .05 which will cluster the reads together. Even with this adjustment I have been able to call valid somatic hypermutations from plasma cells using MiXCR and nanopore data!

PoslavskySV commented 1 year ago

FYI: we'd added dedicated preset for ONT in v4.2:

https://docs.milaboratories.com/mixcr/reference/overview-built-in-presets/#oxford-nanopore-technologies

One can also use this preset for generic PacBio data as well.

ShaowenJ commented 1 year ago

@PoslavskySV Hi MixCR developers:

We are also interested in ONT results on MixCR, is that possible to know the detailed parameters that you used for the preset? I checked the Code from github, most of it is inherited from others. I think that would definitely help us to interpret the results.

Thanks!

PoslavskySV commented 1 year ago

Hi @Lesdormis

you can run

mixcr exportPreset --preset-name ont-rna-seq-vdj-full-length --species hsa

and see all the parameters in detail.

ShaowenJ commented 1 year ago

Thanks @PoslavskySV. You guys are awesome!

ShaowenJ commented 1 year ago

Hi @PoslavskySV , sorry for asking one more question for the ONT preset, Where do R1 and R2 come from Nanopore long-length sequence? I assume we only get one long sequence output.

mizraelson commented 1 year ago

Hi, you are right. That is a typo in the documentation. You should only provide one read. I will fix the docs soon.

ShaowenJ commented 1 year ago

Hi, Would the sequencing depth be a factor in the alignment process? Previously, I got 15% of the ONT reads can be successfully aligned, but I did some overlapping subsetting (50% of previous reads), now get zero alignment reads... Sorry for keep posting questions here.

mizraelson commented 1 year ago

In general sequencing depth may affect the analysis. What kind of subsetting did you perform?

ShaowenJ commented 1 year ago

I filter out the reads that don't have or have wrong barcode UMI information. Also, I have a question: Can I modify the parameter settings from the preset and run it with these modified preset settings. Such as that, make the preset parameters to a .config file and modify the settings, and run the analysis with the new .config as an input.

mizraelson commented 1 year ago

If you have UMIs in your data you can use MiXCR to perform correction.

And yes, you can create a preset file by: mixcr exportPreset --preset-name ont-rna-seq-vdj-full-length --species hsa myPreset.yaml

Note that I have used --species hsa for human reference but you may need to specify other species.

Then you can modify myPreset.yaml file and use it to run MiXCR. (you should put the file in a directory from where you run mixcr or '~/.mixcr/presets/'.

Then you can use it like that:

mixcr analyze local:myPreset \
     --species hsa \
     sample.fastq.gz \
     sample_result\
ShaowenJ commented 1 year ago

Amazing, that's exactly what I need! Thanks!

ShaowenJ commented 1 year ago

Hi @mizraelson, I try with the local preset setting, but didn't run successfully. I guess, I am not sure where the yaml file should be put in since I am working on a HPC and MIXCR has been installed in some location where I don't have permission to put files.

ShaowenJ commented 1 year ago

I get it successfully! Thanks!

mizraelson commented 1 year ago

Always welcome! Let me know if you need any help modifying the preset to use UMIs directly by MiXCR.

ShaowenJ commented 1 year ago

Hi @mizraelson , that will be awesome if you could introduce me how to do that. It can save me a lot of effort on the demultiplexing steps.

mizraelson commented 1 year ago

Sure! Can you share the structure of the library? A scheme maybe? where the UMI is located, are there any adapter sequences etc.?

mehdiborji commented 1 year ago

@Lesdormis May I ask if you are interested in using UMIs in data generated by Nanopore? If that is the case, it's important to note that barcode/UMI in Nanopore data unlike Illumina are not located in fixed locations in the reads. Moreover the reads don't have fixed orientation (5' to 3') so if your UMIs are on one end of the reads you need to first figure out which end has it. If you share some reads and read structure, I can figure out some bioinformatics functions which can process and convert your Nanopore data to a paired-like format similar to Illumina. I can potentially incorporate this as a functionality under my Nanopore data analysis pipeline https://github.com/mehdiborji/nanoranger

pipiha666 commented 1 year ago

@PoslavskySV Hi MixCR developers: Does MixCR have an protocol suitable for single cell Pacbio mas-seq data ? I only find nanopore protocol hear https://mixcr.com/mixcr/reference/overview-built-in-presets/ With the development of third-generation sequencing, I think this is very necessary

pipiha666 commented 1 year ago

Sorry for bumping this old issue, but are there any followups to the question that can be publicly shared? We are also trying to use MixCR for analyzing PacBio data. Thanks!

I have successfully used both PacBio and Nanopore with MiXCR and needs little to no adjustments and works amazingly!

CCS PacBio data needs to correction at all!

For high-indel nanopore data you may need to increase mutation probably to .05 which will cluster the reads together. Even with this adjustment I have been able to call valid somatic hypermutations from plasma cells using MiXCR and nanopore data!

Hi mehdiborji, Does it sutable for single cell pacbio data?