Open WangKang-Leo opened 1 year ago
Could you please elaborate more on the issues? For example, for the inconsistency with GEP, do you mean the V gene or C gene expression from gene expression data is different from TRUST4's prediction? For strandedness, do you mean your RNA-seq is stranded RNA-seq?
I did not look at V/C gene, but using some typical immune biomarkers (CD3,CD8, CD20) in mRNA or protein level, or GEP immune signature score for T cell, B cell.
Our RNA-seq is reverseStranded, and I don't know whether it has influence or not on TRUST4.
The strand-specific RNA-seq probably does not affect TRUST4 much, it is rare to have a read whose reverse-complement matches better to VDJ genes than its true orientation.
For the GEP analysis, do you mean when other data show no or few T or B cells, but TRUST4 still assembles many TCRs and BCRs?
Yes, not well correlated between GEP-based T/B cell and those from TRUST4.....
How did you calculate the abundance from TRUST4? Could you please show me a few high abundant TRBs and IGHs from the airr output? I can check to see whether those are misassemblies. Thank you.
I used code from RIMA (https://github.com/liulab-dfci/RIMA_pipeline), the results were shown as following: Also, I attached the sample with highest T cell infiltration (0.000025). The cohort includes 200 samples. UF-3016-1226-0_airr_align.txt
`module load bioinfo-tools module load TRUST4/1.0.8 module load R/4.1.1 module load R_packages/4.1.1 cd /home/kangwang/TRUST4/result
for ((i=2; i<=200; i++)); do Sample=$(awk -v i=$i 'NR==i{print $1}' /home/kangwang/TRUST4/data/samplesheet_rna_Baseline.tsv) fastq1=$(awk -v i=$i 'NR==i{print $2}' /home/kangwang/TRUST4/data/samplesheet_rna_Baseline.tsv) fastq2=$(awk -v i=$i 'NR==i{print $3}' /home/kangwang/TRUST4/data/samplesheet_rna_Baseline.tsv) cd /home/kangwang/TRUST4/result/Baseline mkdir $Sample cd /home/kangwang/TRUST4/result/Baseline/$Sample run-trust4 -t 64 \ -f /home/kangwang/TRUST4/reference/TRUST4_ref/hg38_bcrtcr.fa \ --ref /home/kangwang/TRUST4/reference/TRUST4_ref/human_IMGT+C.fa \ -1 $fastq1 \ -2 $fastq2 \ -o $Sample sed '1,1s/#count/count/g' ${Sample}_report.tsv > ${Sample}_report.processed.tsv Rscript /home/kangwang/TRUST4/src/trust4_bcr_process.R --cdr3 ${Sample}_report.processed.tsv \ --sampleid $Sample \ --stat /home/kangwang/TRUST4/data/RNA_markdup.sorted.bam.stats/${Sample}.markdup.sorted.bam.stats.txt \ --outdir /home/kangwang/TRUST4/result/Baseline/$Sample/$Sample Rscript /home/kangwang/TRUST4/src/trust4_tcr_process.R --cdr3 ${Sample}_report.processed.tsv \ --sampleid $Sample \ --stat /home/kangwang/TRUST4/data/RNA_markdup.sorted.bam.stats/${Sample}.markdup.sorted.bam.stats.txt \ --outdir /home/kangwang/TRUST4/result/Baseline/$Sample/$Sample done`
Could you please share the UF-3016-1226-0_airr.tsv file, which is the full AIRR output? Thank you.
UF-3016-1226-0_airr.txt Thanks so much, here you are!
Hello again, any news or findings from my AIRR output?
Those assemblies look authentic to me. I guess this sample is sequenced pretty deep, so even though the T or B cell infiltration level is low, it still captures a lot. Also, the abundance is affected by receptor gene expression. For example, the top IGH/IGK CDR3s may come from plasma B cells, which may have super high expression but the actual cell number could be much fewer. Hope this helps.
Thanks for your prompt feedback. Do you have any suggestions to adjust or filter the AIRR results? Because we hope to report coherent results from different tools.
I usually filter the non-productive chains in the results. For BCRs, you may cluster the similar CDR3s as they might be come from the same lineage during somatic hypermutation (which is another definition of clonotype), but this only changes the number of distinct clonotypes and won't affect the total abundances.
Hello, I have run the TRUST4 based on our bulk RNAseq cohort (PE150bp, ribo-zero), where fastq is input, but the results are not consistent with other immune metrics (GEP based). I want to optimize the analyses by adding some parameters, such as strandedness. Our data have significant batch effect, but it should be fine for TRUST4. Or do you have any ideas?
Thanks in advance