Closed lishashali closed 3 years ago
Does R1 contain read information or just barcode+UMI sequences? If only barcode+UMI information, the command line should be: run-trust4 -u XXX_5_S7_L003_R2_001.fastq.gz -f hg38_bcrtcr.fa --ref human_IMGT+C.fa --barcode XXX_5_S7_L003_R1_001.fastq.gz --barcodeRange 0 15 +
Otherwise, you need to preprocess R1 file before running TRUST4.
ok ,I get it ,Thank you for your reply.
HI , /data/lishasha/TRUST4/TRUST4/run-trust4 -u /data/lishasha/TRUST4/liweizhong/fq2/lng/fq2/liweizhong_5_S7_L003_R2_001.fastq.gz -f /data/lishasha/TRUST4/TRUST4/hg38_bcrtcr.fa --ref /data/lishasha/TRUST4/TRUST4/human_IMGT+C.fa --barcodeRange 0 15 + --barcode /data/lishasha/TRUST4/liweizhong/fq2/lng/fq2/liweizhong_5_S7_L003_R1_001.fastq.gz [Tue Jun 30 02:10:44 2020] TRUST4 begins. [Tue Jun 30 02:10:44 2020] SYSTEM CALL: /data/lishasha/TRUST4/TRUST4/fastq-extractor -u /data/lishasha/TRUST4/liweizhong/fq2/lng/fq2/liweizhong_5_S7_L003_R2_001.fastq.gz -t 1 -f /data/lishasha/TRUST4/TRUST4/hg38_bcrtcr.fa -o TRUST_liweizhong_5_S7_L003_R2_001_toassemble --barcodeStart 0 --barcodeEnd 15 --barcode /data/lishasha/TRUST4/liweizhong/fq2/lng/fq2/liweizhong_5_S7_L003_R1_001.fastq.gz [Tue Jun 30 02:10:44 2020] Start to extract candidate reads from read files. Read file is empty. system /data/lishasha/TRUST4/TRUST4/fastq-extractor -u /data/lishasha/TRUST4/liweizhong/fq2/lng/fq2/liweizhong_5_S7_L003_R2_001.fastq.gz -t 1 -f /data/lishasha/TRUST4/TRUST4/hg38_bcrtcr.fa -o TRUST_liweizhong_5_S7_L003_R2_001_toassemble --barcodeStart 0 --barcodeEnd 15 --barcode /data/lishasha/TRUST4/liweizhong/fq2/lng/fq2/liweizhong_5_S7_L003_R1_001.fastq.gz failed: 256 at /data/lishasha/TRUST4/TRUST4/run-trust4 line 37.
R1 contain just barcode+UMI sequences, and the command is :run-trust4 -u XXX_5_S7_L003_R2_001.fastq.gz -f hg38_bcrtcr.fa --ref human_IMGT+C.fa --barcodeRange 0 15 + --barcode XXX_5_S7_L003_R1_001.fastq.gz
Could you please share the first few reads in liweizhong_5_S7_L003_R2_001.fastq.gz and liweizhong_5_S7_L003_R1_001.fastq.gz respectively?
The first one is R1, the second one is R2.
The command and the reads look fine to me. This error you had usually happens when the path is not accessible. It could also due to the binary files downloaded is incompatible with your system. Can you try the singularity image in the release package?
Here are some more information about singularity if you want to mount your own data folder: #13
OK,I will try it. Thank you.
We have released the source code of TRUST4. Can you compile TRUST4 from source code and give it a try? Thank you.
Dear developers,
I have the 10x R1, R2 and I1 with the barcodes. would be possible to read the barcodes directly from the I1 fastq file?
Yes, you can directly run TRUST4 with the option "--barcode XX_I1.fastq --barcodeRange 0 15 +", supposing the first 16nt in the sequence is for barcode. Since the barcode is not in the read file, you don't need to preprocess any files in this situation.
Dear developers,
[scRNAseq]
I successfully recovered the clonotypes from my samples and they do match with TCRseq cdr3s but the cell specific barcodes dont match and I'm getting sometimes more than 1 cell barcode from trust per single barcode in the scRNAseq. Any idea of what could be?
i.e the barcode from cellranger is ATTGGTGGTGTCGCTG and TRUST is giving back AGCGAAAG_16027 and TTTCTGTC_7167
Thank you. Edgar
It seems the barcode file may truncate some of the sequences. What was the command you used, and did you see the short barcode sequences like "AGCGAAAG" or "TTTCTGTC" in the I1 file? From the length, those barcode seems to be the sample index in 10X data instead of the cell barcode.
I used as follows: run-trust4 --barcode file1.I.fastq.gz -1 file1.R1.fastq.gz -2 file1.R2.fastq.gz -f /*/human_IMGT+C.fa -o file1.out
Maybe the headers are informative: file I @NS500672:615:HHWM5BGXB:3:11401:24464:1018 1:N:0:TCTNAAAG TCTNAAAG + AAA#AEEE @NS500672:615:HHWM5BGXB:3:11401:14791:1019 1:N:0:TCTNAAAG TCTNAAAG + AAA#AAEE @NS500672:615:HHWM5BGXB:3:11401:23247:1020 1:N:0:TCTTAAAG TCTTAAAG +
file R1 @NS500672:615:HHWM5BGXB:3:11401:24464:1018 1:N:0:TCTNAAAG NCTTCCACAGGCAGTAAAAGAGCGCGTTTCTTATATGGGAAACAGAATGGCTTTTTGGCTGAGAAGGCTGGGTCTACATTTCAGGCCACATTTGGGGAGACGAATGGAGTCATTCCTGGGAGGTGTTTTGCTGATTTTGTGGCTTCAAGT +
@NS500672:615:HHWM5BGXB:3:11401:14791:1019 1:N:0:TCTNAAAG NTTGGCTCAATAGCAATCCATGTTACTTTCTTATATGGGGGATAAAAATGTGATAAACTTAGTATTGTTTTGAATTTTGTTTTTAATACCCAGGACTAGATTAGAATAGAATTCACAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC +
@NS500672:615:HHWM5BGXB:3:11401:23247:1020 1:N:0:TCTTAAAG NGATGTATCGTAGGAGTAGGTAGAGCTTTCTTATATGGGAAGACCCTAAACTACCAGTGGATAAAATCTTACCCCCACCATCTCCCTGGCCCAAGAGCTCCATCTTTGATGCTGATGAAGAAAAGTCCAAGCTTCTGACAAGGCTTCTAA +
It seems the I1 file is for sample index, so the barcode information should be in the read fastq file. Based on the 10X document: https://assets.ctfassets.net/an68im79xiti/1CnKSfa7taoQwIEe0WaA4m/8635b2c9ee86c022e731b6fb2e13fed2/CG000080_10x_Technical_Note_Base_Composition_SC3_v2_RevB.pdf the barcode should be in the last portion of read1, and you may need to preprocess the read first depending on the library you are using. However, I manually checked those in 10X barcode whitelist but could not find those, suggesting either the barcode is on a different part or there were lots of sequencing errors in the region. I'm currently working on TRUST4 so it can better handle the fastq inputs from 10X Genomics data.
Currently, the more robust way to process 10X data is to run 10X Genomics cellranger first to generate the bam file, which automatically handles all the issues with barcode.
That makes a lot of sense. Thank you so much.
Hi again,
I re-ran cellranger and attempted to use TRUST4 directly on the bam file, however, I got this error: TRUST4 begins. SYSTEM CALL: bam-extractor -b possorted_genome_bam.bam -t 1 -f human_IMGT+C.fa -o file.outs.possorted_genome_bam.out_toassemble --barcode CB Start to extract candidate reads from bam file. Unknown genome name: GGTACAACTGGAACGAC failed: 256 at run-trust4 line 44.
No idea why it did not recognize the genome, any suggestion?
For bam input, you need to use options "-f hg38_bcrtcr.fa --ref human_IMGT+C.fa". The hg38_bcrtcr.fa contains the coordinate information for the BCR/TCR genes.
ah, my bad. Thank you.
Dear Developers, I have a question about the option '--barcodeRange INT INT CHAR '. The r1-length of single-cell 5′ data includes the Barcode(1-16) and UMI sequences. when I analyzed my single-cell 5′ data, This is my running command: run-trust4 -1 XXX_5_S7_L003_R1_001.fastq.gz -2 XXX_5_S7_L003_R2_001.fastq.gz -f hg38_bcrtcr.fa --ref human_IMGT+C.fa --barcodeRange 0 -16 + Is this right?