liulab-dfci / TRUST4

TCR and BCR assembly from RNA-seq data
MIT License
256 stars 46 forks source link

TRUST4 config for 10X Genomics #272

Open khoidnyds opened 1 month ago

khoidnyds commented 1 month ago

Hello, I have 2 10X Genomics datasets (BCR and GEX). What are the appropriate parameters setting for TRUST4? I'm using f"run-trust4 -1 {f1} -2 {f2} --barcode {f1} --UMI {f1} --readFormat bc:0:15,um:16:25,r1:26:-1 --barcodeWhitelist {whitelist_barcodes} --od {out_dir} -f {hg38_bcrtcr_path} --ref {human_IMGT_path} -t 24 --repseq" for BCR and f"run-trust4 -1 {f1} -2 {f2} --barcode {f1} --UMI {f1} --readFormat bc:0:15,um:16:25,r1:26:-1 --barcodeWhitelist {whitelist_barcodes} --od {out_dir} -f {hg38_bcrtcr_path} --ref {human_IMGT_path} -t 24" for GEX whitelist_barcodes = cellranger-8.0.0/lib/python/cellranger/barcodes/3M-5pgex-jan-2023.txt

but the results are so little of # VDJs compared to Cellranger. I added the first few lines of {f1} and {f2} and index {f1} here too Thank you

Screenshot 2024-05-14 at 11 31 06 AM
mourisl commented 1 month ago

Which version of TRUST4 are you using? If it is v1.1.1, you can run TRUST4 on the BCR-seq portion without "--repseq" option. How many reads in the toassemble_bc.fq file have "missing_barcode" value? This can be useful to check whether the barcode portion is extracted appropriately. Thank you.

khoidnyds commented 1 month ago

I'm using TRUST4 v1.1.1-r505. I checked the toassemble_bc.fa and most of the reads having missing_barcode values. Shouldn't the barcodes always be the first 16 bases of R1? My dataset lengths are mixed. One set has R1 of 26, R2 of 91. Another set has R1 = R2 = 151 bp. What do you recommend for the --readFormat parameter? Thank you

khoidnyds commented 1 month ago

And should I include UMI information when running trust4 or just ignore that?

mourisl commented 1 month ago

The UMI information affects little with the final results, you can include it. Do you see the same fraction of barcode as "missing_barcode" from the GEX side? If you have some of the cellranger VDJ results, you can also check whether the barcode resides on the first 16bp of R1.

khoidnyds commented 1 month ago

Hi mourisl, Thanks for helping. I checked the cell_id in _barcode_airr.tsv. Some located at first 16 bp (as expected). but some don't exist in R1.fastq.gz. I can't explain what happened here. Could you give me some hints? Thanks

mourisl commented 1 month ago

Those could be from error-corrected barcodes.