liulab-dfci / TRUST4

TCR and BCR assembly from RNA-seq data
MIT License
272 stars 47 forks source link

I would like to know how Trust4 can directly analyze paired-end .fastq format data from the 10X Genomics platform for single-cell analysis.? #271

Open fight2021 opened 4 months ago

fight2021 commented 4 months ago

I would like to ask how Trust4 can directly analyze paired-end .fastq format data from the 10X Genomics platform for single-cell analysis, instead of analyzing BAM format data. Can you provide support for this analysis? The current analysis speed is too slow.

run-trust4 -t 25 -b /home/zxsys/data6/bam/SRR22007527_genome_bam.bam -f /home/zxsys/data6/hg38_bcrtcr.fa --ref /home/zxsys/data6/human_IMGT+C.fa --barcode CB

Is it possible to directly use FASTQ format for paired-end single-cell data analysis without using BAM files, while still ensuring that Trust4 operates normally?

mourisl commented 4 months ago

Here is the reply from the discussion just in case you missed it: Yes, you can. It would be something like running fastq files from this section: https://github.com/liulab-dfci/TRUST4?tab=readme-ov-file#10x-genomics-data-and-barcode-based-single-cell-data . For the running speed, which version of TRUST4 are you using? Which step do you find is too slow?

fight2021 commented 4 months ago

I am currently using the Cell Ranger to analyze upstream FASTQ data to obtain BAM format data for 10X single-cell transcriptome analysis of the immune repertoire. Then, I use the command run-trust4 -t 25 -b /home/zxsys/data6/bam/SRR22007527_genome_bam.bam -f /home/zxsys/data6/hg38_bcrtcr.fa --ref /home/zxsys/data6/human_IMGT+C.fa --barcode CB to analyze the BAM data to obtain single-cell immune repertoire data. This workflow is too slow, preventing rapid completion of data analysis. I would now like to know how to use the Trust4 command to directly analyze single-cell transcriptome FASTQ data to obtain immune repertoire data, without first using Cell Ranger to analyze and obtain BAM. Currently, when I use the command run-trust4 -f hg38_bcrtcr.fa --ref human_IMGT+C.fa -u path_to_10X_fastqs/R2.fastq.gz --barcode path_to_10X_fastqs/R1.fastq.gz --readFormat bc:0:15 --barcodeWhitelist cellranger_folder/cellranger-cs/VERSION/lib/python/cellranger/barcodes/737K-august-2016.txt [other options] to analyze single-cell transcriptome data, it results in errors and the analysis cannot be completed.

fight2021 commented 4 months ago

First of all, thank you for your reply.

mourisl commented 4 months ago

What error message did you get? Is your data 10X gene expression data or 10X vdj-kit data? Which version of TRUST4 are you using? Your command looks right to me. (Let's use this issue instead of the Discussion).

fight2021 commented 4 months ago

Hello expert, I am currently using the following command which only supports single-end data. Could you provide a command for analyzing paired-end data? Since I am a beginner, there are many things I still need to learn. run-trust4 -f hg38_bcrtcr.fa --ref human_IMGT+C.fa -u path_to_10X_fastqs/R2.fastq.gz --barcode path_to_10X_fastqs/R1.fastq.gz --readFormat bc:0:15 --barcodeWhitelist cellranger_folder/cellranger-cs/VERSION/lib/python/cellranger/barcodes/737K-august-2016.txt [other options]

mourisl commented 4 months ago

This depends on your structure. For example, if the read is in both R1, R2, and barcode and UMI is also in R1's first 26bp (16bp barcode + 10bp UMI), You can use "-1 R1 -2 R2 --barcode R1 --readFromat bc:0:15,r1:26:-1" for this.