Closed jolespin closed 2 years ago
We have provided the IMGT database sequence in the repository already (human_IMGT+C.fa), so you can start with it.
Thank you. I'll use this to start.
A few follow up questions:
hg38_bcrtcr.fa
instead of human_IMGT+C.fa
?[Thu Dec 23 09:31:50 2021] SYSTEM CALL: perl /local/ifs2_devel/ANNOTATION/jespinoz/anaconda3/envs/trust4_env/bin/trust-airr.pl test_cdr3.out test_annot.fa > test_airr.tsv
Argument "assemble3" isn't numeric in int at /local/ifs2_devel/ANNOTATION/jespinoz/anaconda3/envs/trust4_env/bin/trust-airr.pl line 144, <FP> line 1.
- For BAM file input, you have to use hg38_bcrtcr.fa as the -f input. For the fastq input, the results from using hg38_bcrtcr.fa or human_IMGT+C.fa are almost identical. Since hg38_bcrtcr.fa has fewer redundant sequences and could improve the running time, I usually use hg38_bcrtcr.fa file for -f option.
For the BAM file, is it the reads mapped to what was used with the --ref
or -f
flag?
- Yes. If you want to use the same file on both options, you can actually just use "-f human_IMGT+C.fa" without --ref for short.
Got it so if nothing is used for --ref
then it automatically propagates the argument from -f
(where -f
is mandatory)?
- Is your TRUST4 version the most recent github version? I think in the current version, the trust-airr.pl command takes test_report.tsv as input instead of test_cdr3.out.
Here's my version:
(trust4_env) -bash-4.2$ conda list | grep "trust4"
# packages in environment at /usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/trust4_env:
trust4 1.0.6 h2e03b76_0 bioconda
Also, here was my full command for a few more details. First the main script couldn't find trust-simplerep.pl
in the path so I copied it over to bin.
(trust4_env) -bash-4.2$ run-trust4 -1 trimmed_repaired_1.fastq.gz -2 trimmed_repaired_2.fastq.gz -t 4 -f ../TRUST4/human_IMGT+C.fa --ref ../TRUST4/human_IMGT+C.fa -o test
[Thu Dec 23 09:25:39 2021] TRUST4 begins.
[Thu Dec 23 09:25:39 2021] SYSTEM CALL: /local/ifs2_devel/ANNOTATION/jespinoz/anaconda3/envs/trust4_env/bin/fastq-extractor -t 4 -f ../TRUST4/human_IMGT+C.fa -o test_toassemble -1 trimmed_repaired_1.fastq.gz -2 trimmed_repaired_2.fastq.gz
[Thu Dec 23 09:25:39 2021] Start to extract candidate reads from read files.
[Thu Dec 23 09:26:21 2021] Finish extracting reads.
[Thu Dec 23 09:26:21 2021] SYSTEM CALL: /local/ifs2_devel/ANNOTATION/jespinoz/anaconda3/envs/trust4_env/bin/trust4 -t 4 -f ../TRUST4/human_IMGT+C.fa -o test -1 test_toassemble_1.fq -2 test_toassemble_2.fq
[Thu Dec 23 09:26:21 2021] Found 2088 reads.
[Thu Dec 23 09:26:21 2021] Finish sorting the reads.
[Thu Dec 23 09:26:21 2021] Finish rough annotations.
[Thu Dec 23 09:26:21 2021] Assembled 78 reads.
[Thu Dec 23 09:26:21 2021] Try to rescue 0 reads for assembly.
[Thu Dec 23 09:26:21 2021] Rescued 0 reads.
[Thu Dec 23 09:26:21 2021] Extend assemblies by mate pair information.
[Thu Dec 23 09:26:21 2021] Remove redundant assemblies.
[Thu Dec 23 09:26:21 2021] Finish assembly.
[Thu Dec 23 09:26:21 2021] SYSTEM CALL: /local/ifs2_devel/ANNOTATION/jespinoz/anaconda3/envs/trust4_env/bin/annotator -f ../TRUST4/human_IMGT+C.fa -a test_final.out -t 4 -o test -r test_assembled_reads.fa > test_annot.fa
[Thu Dec 23 09:26:21 2021] Start to annotate assemblies.
[Thu Dec 23 09:26:21 2021] Start to realign reads for CDR3 analysis.
[Thu Dec 23 09:26:21 2021] Compute CDR3 abundance.
[Thu Dec 23 09:26:21 2021] Finish annotation.
[Thu Dec 23 09:26:21 2021] SYSTEM CALL: perl /local/ifs2_devel/ANNOTATION/jespinoz/anaconda3/envs/trust4_env/bin/trust-simplerep.pl test_cdr3.out > test_report.tsv
[Thu Dec 23 09:26:22 2021] SYSTEM CALL: perl /local/ifs2_devel/ANNOTATION/jespinoz/anaconda3/envs/trust4_env/bin/trust-airr.pl test_cdr3.out test_annot.fa > test_airr.tsv
Can't open perl script "/local/ifs2_devel/ANNOTATION/jespinoz/anaconda3/envs/trust4_env/bin/trust-airr.pl": No such file or directory
system perl /local/ifs2_devel/ANNOTATION/jespinoz/anaconda3/envs/trust4_env/bin/trust-airr.pl test_cdr3.out test_annot.fa > test_airr.tsv failed: 512 at /usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/trust4_env/bin/run-trust4 line 48.
# Checking if file exists
(trust4_env) -bash-4.2$ less /local/ifs2_devel/ANNOTATION/jespinoz/anaconda3/envs/trust4_env/bin/trust-airr.pl
/local/ifs2_devel/ANNOTATION/jespinoz/anaconda3/envs/trust4_env/bin/trust-airr.pl: No such file or directory
# Copying it over
(trust4_env) -bash-4.2$ cp trust-airr.pl /local/ifs2_devel/ANNOTATION/jespinoz/anaconda3/envs/trust4_env/bin/trust-airr.pl
# Rerun it
(trust4_env) -bash-4.2$ run-trust4 -1 trimmed_repaired_1.fastq.gz -2 trimmed_repaired_2.fastq.gz -t 4 -f ../TRUST4/human_IMGT+C.fa --ref ../TRUST4/human_IMGT+C.fa -o test
[Thu Dec 23 09:31:08 2021] TRUST4 begins.
[Thu Dec 23 09:31:08 2021] SYSTEM CALL: /local/ifs2_devel/ANNOTATION/jespinoz/anaconda3/envs/trust4_env/bin/fastq-extractor -t 4 -f ../TRUST4/human_IMGT+C.fa -o test_toassemble -1 trimmed_repaired_1.fastq.gz -2 trimmed_repaired_2.fastq.gz
[Thu Dec 23 09:31:08 2021] Start to extract candidate reads from read files.
[Thu Dec 23 09:31:49 2021] Finish extracting reads.
[Thu Dec 23 09:31:49 2021] SYSTEM CALL: /local/ifs2_devel/ANNOTATION/jespinoz/anaconda3/envs/trust4_env/bin/trust4 -t 4 -f ../TRUST4/human_IMGT+C.fa -o test -1 test_toassemble_1.fq -2 test_toassemble_2.fq
[Thu Dec 23 09:31:49 2021] Found 2088 reads.
[Thu Dec 23 09:31:49 2021] Finish sorting the reads.
[Thu Dec 23 09:31:49 2021] Finish rough annotations.
[Thu Dec 23 09:31:49 2021] Assembled 78 reads.
[Thu Dec 23 09:31:49 2021] Try to rescue 0 reads for assembly.
[Thu Dec 23 09:31:49 2021] Rescued 0 reads.
[Thu Dec 23 09:31:50 2021] Extend assemblies by mate pair information.
[Thu Dec 23 09:31:50 2021] Remove redundant assemblies.
[Thu Dec 23 09:31:50 2021] Finish assembly.
[Thu Dec 23 09:31:50 2021] SYSTEM CALL: /local/ifs2_devel/ANNOTATION/jespinoz/anaconda3/envs/trust4_env/bin/annotator -f ../TRUST4/human_IMGT+C.fa -a test_final.out -t 4 -o test -r test_assembled_reads.fa > test_annot.fa
[Thu Dec 23 09:31:50 2021] Start to annotate assemblies.
[Thu Dec 23 09:31:50 2021] Start to realign reads for CDR3 analysis.
[Thu Dec 23 09:31:50 2021] Compute CDR3 abundance.
[Thu Dec 23 09:31:50 2021] Finish annotation.
[Thu Dec 23 09:31:50 2021] SYSTEM CALL: perl /local/ifs2_devel/ANNOTATION/jespinoz/anaconda3/envs/trust4_env/bin/trust-simplerep.pl test_cdr3.out > test_report.tsv
[Thu Dec 23 09:31:50 2021] SYSTEM CALL: perl /local/ifs2_devel/ANNOTATION/jespinoz/anaconda3/envs/trust4_env/bin/trust-airr.pl test_cdr3.out test_annot.fa > test_airr.tsv
Argument "assemble3" isn't numeric in int at /local/ifs2_devel/ANNOTATION/jespinoz/anaconda3/envs/trust4_env/bin/trust-airr.pl line 144, <FP> line 1.
[Thu Dec 23 09:31:50 2021] TRUST4 finishes.
Hope this helps!
Thank you. I just ran make
right now. Which files should be copied to bin in my conda environment? (i.e., /usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/trust4_env/bin/
)
(base) -bash-4.2$ tree .
.
├── AlignAlgo.hpp
├── alignments.hpp
├── annotator
├── Annotator.cpp
├── Annotator.o
├── bam-extractor
├── BamExtractor.cpp
├── BamExtractor.o
├── BarcodeCorrector.hpp
├── BuildDatabaseFa.pl
├── BuildImgtAnnot.pl
├── BuildImgtVquestAnnot.pl
├── Cgene.list
├── defs.h
├── example
│ ├── example_1.fq
│ ├── example_2.fq
│ ├── example.bam
│ ├── TRUST_example_annot.fa
│ ├── TRUST_example_cdr3.out
│ └── TRUST_example_report.tsv
├── fastq-extractor
├── FastqExtractor.cpp
├── FastqExtractor.o
├── FilterAnnotatedAssembly.pl
├── hg19_bcrtcr.fa
├── hg38_bcrtcr.fa
├── human_IMGT+C.fa
├── human_vdjc.list
├── KmerCode.hpp
├── KmerCount.hpp
├── KmerIndex.hpp
├── kseq.h
├── LICENSE.txt
├── main.cpp
├── main.o
├── Makefile
├── mouse
│ ├── GRCm38_bcrtcr.fa
│ └── mouse_IMGT+C.fa
├── ReadFiles.hpp
├── README.md
├── run-trust4
├── samtools-0.1.19
│ ├── AUTHORS
│ ├── bam2bcf.c
│ ├── bam2bcf.h
│ ├── bam2bcf_indel.c
│ ├── bam2bcf_indel.o
│ ├── bam2bcf.o
│ ├── bam2depth.c
│ ├── bam2depth.o
│ ├── bam_aux.c
│ ├── bam_aux.o
│ ├── bam.c
│ ├── bam_cat.c
│ ├── bam_cat.o
│ ├── bam_color.c
│ ├── bam_color.o
│ ├── bam_endian.h
│ ├── bam.h
│ ├── bam_import.c
│ ├── bam_import.o
│ ├── bam_index.c
│ ├── bam_index.o
│ ├── bam_lpileup.c
│ ├── bam_lpileup.o
│ ├── bam_mate.c
│ ├── bam_mate.o
│ ├── bam_md.c
│ ├── bam_md.o
│ ├── bam.o
│ ├── bam_pileup.c
│ ├── bam_pileup.o
│ ├── bam_plcmd.c
│ ├── bam_plcmd.o
│ ├── bam_reheader.c
│ ├── bam_reheader.o
│ ├── bam_rmdup.c
│ ├── bam_rmdup.o
│ ├── bam_rmdupse.c
│ ├── bam_rmdupse.o
│ ├── bamshuf.c
│ ├── bamshuf.o
│ ├── bam_sort.c
│ ├── bam_sort.o
│ ├── bam_stat.c
│ ├── bam_stat.o
│ ├── bamtk.c
│ ├── bamtk.o
│ ├── bam_tview.c
│ ├── bam_tview_curses.c
│ ├── bam_tview_curses.o
│ ├── bam_tview.h
│ ├── bam_tview_html.c
│ ├── bam_tview_html.o
│ ├── bam_tview.o
│ ├── bcftools
│ │ ├── bcf2qcall.c
│ │ ├── bcf2qcall.o
│ │ ├── bcf.c
│ │ ├── bcf.h
│ │ ├── bcf.o
│ │ ├── bcf.tex
│ │ ├── bcftools
│ │ ├── bcfutils.c
│ │ ├── bcfutils.o
│ │ ├── call1.c
│ │ ├── call1.o
│ │ ├── em.c
│ │ ├── em.o
│ │ ├── fet.c
│ │ ├── fet.o
│ │ ├── index.c
│ │ ├── index.o
│ │ ├── kfunc.c
│ │ ├── kfunc.o
│ │ ├── kmin.c
│ │ ├── kmin.h
│ │ ├── kmin.o
│ │ ├── libbcf.a
│ │ ├── main.c
│ │ ├── main.o
│ │ ├── Makefile
│ │ ├── mut.c
│ │ ├── mut.o
│ │ ├── prob1.c
│ │ ├── prob1.h
│ │ ├── prob1.o
│ │ ├── README
│ │ ├── vcf.c
│ │ ├── vcf.o
│ │ └── vcfutils.pl
│ ├── bedcov.c
│ ├── bedcov.o
│ ├── bedidx.c
│ ├── bedidx.o
│ ├── bgzf.c
│ ├── bgzf.h
│ ├── bgzf.o
│ ├── bgzip.c
│ ├── ChangeLog.old
│ ├── COPYING
│ ├── cut_target.c
│ ├── cut_target.o
│ ├── errmod.c
│ ├── errmod.h
│ ├── errmod.o
│ ├── examples
│ │ ├── 00README.txt
│ │ ├── bam2bed.c
│ │ ├── calDepth.c
│ │ ├── chk_indel.c
│ │ ├── ex1.fa
│ │ ├── ex1.sam.gz
│ │ ├── Makefile
│ │ ├── toy.fa
│ │ └── toy.sam
│ ├── faidx.c
│ ├── faidx.h
│ ├── faidx.o
│ ├── INSTALL
│ ├── kaln.c
│ ├── kaln.h
│ ├── kaln.o
│ ├── khash.h
│ ├── klist.h
│ ├── knetfile.c
│ ├── knetfile.h
│ ├── knetfile.o
│ ├── kprobaln.c
│ ├── kprobaln.h
│ ├── kprobaln.o
│ ├── kseq.h
│ ├── ksort.h
│ ├── kstring.c
│ ├── kstring.h
│ ├── kstring.o
│ ├── libbam.a
│ ├── Makefile
│ ├── Makefile.mingw
│ ├── misc
│ │ ├── ace2sam
│ │ ├── ace2sam.c
│ │ ├── ace2sam.o
│ │ ├── bamcheck
│ │ ├── bamcheck.c
│ │ ├── bamcheck.o
│ │ ├── blast2sam.pl
│ │ ├── bowtie2sam.pl
│ │ ├── export2sam.pl
│ │ ├── HmmGlocal.java
│ │ ├── interpolate_sam.pl
│ │ ├── Makefile
│ │ ├── maq2sam.c
│ │ ├── maq2sam-long
│ │ ├── maq2sam-short
│ │ ├── md5.c
│ │ ├── md5fa
│ │ ├── md5fa.c
│ │ ├── md5fa.o
│ │ ├── md5.h
│ │ ├── md5.o
│ │ ├── md5sum-lite
│ │ ├── md5sum-lite.o
│ │ ├── novo2sam.pl
│ │ ├── plot-bamcheck
│ │ ├── psl2sam.pl
│ │ ├── r2plot.lua
│ │ ├── sam2vcf.pl
│ │ ├── samtools.pl
│ │ ├── soap2sam.pl
│ │ ├── varfilter.py
│ │ ├── vcfutils.lua
│ │ ├── wgsim
│ │ ├── wgsim.c
│ │ ├── wgsim_eval.pl
│ │ ├── wgsim.o
│ │ └── zoom2sam.pl
│ ├── NEWS
│ ├── padding.c
│ ├── padding.o
│ ├── phase.c
│ ├── phase.o
│ ├── razf.c
│ ├── razf.h
│ ├── razf.o
│ ├── razip.c
│ ├── sam.c
│ ├── sam.h
│ ├── sam_header.c
│ ├── sam_header.h
│ ├── sam_header.o
│ ├── sam.o
│ ├── sample.c
│ ├── sample.h
│ ├── sample.o
│ ├── samtools
│ ├── samtools.1
│ ├── sam_view.c
│ ├── sam_view.o
│ └── win32
│ ├── xcurses.h
│ ├── zconf.h
│ └── zlib.h
├── scripts
│ ├── AddSequenceToCDR3File.pl
│ ├── barcoderep-filter.py
│ ├── GetFullLengthAssembly.pl
│ ├── trust-barcoderep-to-10X.pl
│ ├── trust-cluster.py
│ └── trust-stats.py
├── SeqSet.hpp
├── SimpleVector.hpp
├── trust4
├── trust-airr.pl
├── trust-barcoderep.pl
├── trust-simplerep.pl
└── trust-smartseq.pl
8 directories, 251 files
I think you can just replace the trust-airr.pl, the other code should not affect the running results.
I'll need to do a deep dive into this. I had trouble with some of the backend libraries after I ran make
:
(trust4_env) -bash-4.2$ ./annotator
./annotator: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by ./annotator)
./annotator: /lib64/libstdc++.so.6: version `CXXABI_1.3.8' not found (required by ./annotator)
./annotator: /lib64/libstdc++.so.6: version `CXXABI_1.3.9' not found (required by ./annotator)
(trust4_env) -bash-4.2$ conda list
# packages in environment at /usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/trust4_env:
#
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 1_gnu conda-forge
bzip2 1.0.8 h7f98852_4 conda-forge
c-ares 1.18.1 h7f98852_0 conda-forge
ca-certificates 2021.10.8 ha878542_0 conda-forge
htslib 1.14 h9093b5e_0 bioconda
krb5 1.19.2 hcc1bbae_3 conda-forge
libcurl 7.80.0 h2574ce0_0 conda-forge
libdeflate 1.7 h7f98852_5 conda-forge
libedit 3.1.20191231 he28a2e2_2 conda-forge
libev 4.33 h516909a_1 conda-forge
libgcc 7.2.0 h69d50b8_2 conda-forge
libgcc-ng 11.2.0 h1d223b6_11 conda-forge
libgomp 11.2.0 h1d223b6_11 conda-forge
libnghttp2 1.43.0 h812cca2_1 conda-forge
libssh2 1.10.0 ha56f1ee_2 conda-forge
libstdcxx-ng 11.2.0 he4da1e4_11 conda-forge
libzlib 1.2.11 h36c2ea0_1013 conda-forge
ncurses 6.2 h58526e2_4 conda-forge
openssl 1.1.1l h7f98852_0 conda-forge
perl 5.32.1 1_h7f98852_perl5 conda-forge
samtools 1.14 hb421002_0 bioconda
tk 8.6.11 h27826a3_1 conda-forge
xz 5.2.5 h516909a_1 conda-forge
zlib 1.2.11 h36c2ea0_1013 conda-forge
Got it to work.
What I did was:
conda create -n trust4_env -c bioconda trust4 -y
to create a new conda environment with all the dependencies$ ls trust4-1.0.6-h2e03b76_0/bin
BuildDatabaseFa.pl fastq-extractor trust-smartseq.pl
BuildImgtAnnot.pl run-trust4 trust4
annotator trust-barcoderep.pl
bam-extractor trust-simplerep.pl
So I symlinked the missing files from the proper release:
# Download v1.0.6
wget https://github.com/liulab-dfci/TRUST4/archive/refs/tags/v1.0.6.zip
# Unzip it
unzip v1.0.6.zip
# Symlink files
ln -sf /usr/local/devel/ANNOTATION/jespinoz/Packages/TRUST4/TRUST4-1.0.6/*.pl /usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/trust4_env/bin/
ln -sf /usr/local/devel/ANNOTATION/jespinoz/Packages/TRUST4/TRUST4-1.0.6/scripts/* /usr/local/devel/ANNOTATION/jespinoz/anaconda3/envs/trust4_env/bin/
Thank you! I just realized the bioconda receipt might miss some scripts introduced in the later versions. I will update it later.
Thanks for working through that with me! I hope it was mutually helpful.
Do you recommend any post processing pipelines or interpreting the output? Apologies in advance if this out of scope.
In the scripts folder, we have trust-stats.py to compute some basic diversity statistics for each sample. You can also use tools like VDJTools, Immunarch for downstream analysis.
Maybe a good idea to re-open this?
I got the same error ./annotator: /lib64/libstdc++.so.6: version 'GLIBCXX_3.4.21' not found (required by ./annotator)
and got it to work after following the instructions from here: https://github.com/liulab-dfci/TRUST4/issues/108#issuecomment-1000614163.
I don't know if it is a good idea to add the instructions above to the "how to install" or to automate it (not sure if possible).
PS: thanks for making and publishing this tool!
@PedroMilanezAlmeida This seems more of the conda environment issue. Let me see whether I can update bioconda's receipt to resolve this.
I'm taking my first dive into single-celled human data with BCR/TCR sequence construction. Is there a database that you would recommend to start out with? You mentioned IMGT but there are several versions. Can you link which one you would recommend to start?