Closed jvivian-atreca closed 4 years ago
Thanks for sharing the results.
Does the file output/PC-1-6/PC-1-6_annot.fa contain any assemblies annotations?
Hi @mourisl , thank you for your reply. No, for all samples where it fails at the annotation step the _annot.fa
file is empty.
-rw-rw-r-- 1 ubuntu ubuntu 647831 Jul 13 21:37 DOR-0-3_annot.fa
-rw-rw-r-- 1 ubuntu ubuntu 447623 Jul 13 21:45 DOR-0-4_annot.fa
-rw-rw-r-- 1 ubuntu ubuntu 369792 Jul 13 21:52 DOR-0-7_annot.fa
-rw-rw-r-- 1 ubuntu ubuntu 0 Jul 13 21:59 NC-1-2_annot.fa
-rw-rw-r-- 1 ubuntu ubuntu 319042 Jul 13 22:05 NC-1-4_annot.fa
-rw-rw-r-- 1 ubuntu ubuntu 295966 Jul 13 22:13 NC-1-5_annot.fa
-rw-rw-r-- 1 ubuntu ubuntu 573115 Jul 13 22:20 NC-1-6_annot.fa
-rw-rw-r-- 1 ubuntu ubuntu 0 Jul 13 22:29 NC-2-1_annot.fa
-rw-rw-r-- 1 ubuntu ubuntu 537733 Jul 13 22:37 NC-2-7_annot.fa
-rw-rw-r-- 1 ubuntu ubuntu 819282 Jul 13 22:44 PC-1-1_annot.fa
-rw-rw-r-- 1 ubuntu ubuntu 0 Jul 13 22:53 PC-1-6_annot.fa
-rw-rw-r-- 1 ubuntu ubuntu 0 Jul 13 23:01 PC-2-3_annot.fa
-rw-rw-r-- 1 ubuntu ubuntu 887110 Jul 13 23:09 PC-2-5_annot.fa
-rw-rw-r-- 1 ubuntu ubuntu 0 Jul 13 23:16 XP-1-4_annot.fa
-rw-rw-r-- 1 ubuntu ubuntu 762502 Jul 13 23:23 XP-1-5_annot.fa
-rw-rw-r-- 1 ubuntu ubuntu 0 Jul 13 23:31 XP-1-7_annot.fa
-rw-rw-r-- 1 ubuntu ubuntu 842778 Jul 13 23:39 XP-1-8_annot.fa
-rw-rw-r-- 1 ubuntu ubuntu 0 Jul 13 23:48 XP-2-1_annot.fa
-rw-rw-r-- 1 ubuntu ubuntu 0 Jul 13 23:58 XP-2-3_annot.fa
-rw-rw-r-- 1 ubuntu ubuntu 0 Jul 14 01:43 XP-2-5_annot.fa
Can you share one of the _final.out files that failed to generate the _annot.fa file? Thank you.
Hi @mourisl ,
Thank you — I'm checking with my supervisor about sharing the sequences, so will let you know shortly.
I noticed while looking at the files that all the failed samples were >= 4.4MB:
-rw-rw-r-- 1 ubuntu ubuntu 2.9M Jul 13 21:37 DOR-0-3_final.out.
-rw-rw-r-- 1 ubuntu ubuntu 2.0M Jul 13 21:45 DOR-0-4_final.out
-rw-rw-r-- 1 ubuntu ubuntu 1.7M Jul 13 21:52 DOR-0-7_final.out
-rw-rw-r-- 1 ubuntu ubuntu 4.8M Jul 13 21:59 NC-1-2_final.out <=======
-rw-rw-r-- 1 ubuntu ubuntu 1.4M Jul 13 22:05 NC-1-4_final.out
-rw-rw-r-- 1 ubuntu ubuntu 1.4M Jul 13 22:13 NC-1-5_final.out
-rw-rw-r-- 1 ubuntu ubuntu 2.6M Jul 13 22:20 NC-1-6_final.out
-rw-rw-r-- 1 ubuntu ubuntu 6.2M Jul 13 22:29 NC-2-1_final.out <=======
-rw-rw-r-- 1 ubuntu ubuntu 2.5M Jul 13 22:37 NC-2-7_final.out
-rw-rw-r-- 1 ubuntu ubuntu 3.7M Jul 13 22:44 PC-1-1_final.out
-rw-rw-r-- 1 ubuntu ubuntu 7.2M Jul 13 22:53 PC-1-6_final.out <=======
-rw-rw-r-- 1 ubuntu ubuntu 8.4M Jul 13 23:01 PC-2-3_final.out <=======
-rw-rw-r-- 1 ubuntu ubuntu 4.0M Jul 13 23:09 PC-2-5_final.out
-rw-rw-r-- 1 ubuntu ubuntu 4.4M Jul 13 23:16 XP-1-4_final.out <=======
-rw-rw-r-- 1 ubuntu ubuntu 3.4M Jul 13 23:23 XP-1-5_final.out
-rw-rw-r-- 1 ubuntu ubuntu 5.1M Jul 13 23:31 XP-1-7_final.out <=======
-rw-rw-r-- 1 ubuntu ubuntu 3.8M Jul 13 23:39 XP-1-8_final.out
-rw-rw-r-- 1 ubuntu ubuntu 7.4M Jul 13 23:48 XP-2-1_final.out <=======
-rw-rw-r-- 1 ubuntu ubuntu 8.5M Jul 13 23:58 XP-2-3_final.out <=======
-rw-rw-r-- 1 ubuntu ubuntu 6.6M Jul 14 01:43 XP-2-5_final.out <=======
I then took one of the failed samples and cut it in half:
(base) ubuntu@ip-10-200-0-18:~$ wc -l output/NC-1-2/NC-1-2_final.out
12900 output/NC-1-2/NC-1-2_final.out
head -n 6450 output/NC-1-2/NC-1-2_final.out >output/NC-1-2/foo_final.out
(base) ubuntu@ip-10-200-0-18:~$ ./TRUST4/annotator -f TRUST4/mouse/mouse_IMGT+C.fa -a output/NC-1-2/foo_final.out -t 16 -o output/NC-1-2/foo -r output/NC-1-2/NC-1-2_assembled_reads.fa > output/NC-1-2/foo_annot.fa
[Tue Jul 14 20:33:18 2020] Start to annotate assemblies.
[Tue Jul 14 20:33:18 2020] Start to realign reads for CDR3 analysis.
[Tue Jul 14 20:33:19 2020] Compute CDR3 abundance.
[Tue Jul 14 20:33:19 2020] Finish annotation.
This also worked if I tail
ed the second half of the file. This machine has 32G of RAM so its not an issue with memory... Any thoughts?
Interesting. The annotation shouldn't take much memory so memory should not be the issue. Can you try to run it with single thread (-t 1)?
Hi @mourisl — I should have included that in the previous comment, but I tried that next and it still failed. I don't know Perl or I would try to take a look, but is it possible there's a fixed-size data structure that is being overflowed?
(base) ubuntu@ip-10-200-0-18:~$ ./TRUST4/annotator -f TRUST4/mouse/mouse_IMGT+C.fa -a output/NC-1-2/NC-1-2_final.out -t 1 -o output/NC-1-2/foo -r output/NC-1-2/NC-1-2_assembled_reads.fa > output/NC-1-2/foo_annot.fa
[Tue Jul 14 20:40:33 2020] Start to annotate assemblies.
Segmentation fault (core dumped)
Hi @mourisl — I got permission to share one of the files in case it is helpful for debugging: NC-1-2_final.out
Thank you! I'll check this right away!
It finished successfully on my computer. Can you try to run it without option "-r"? If this also fails, I would guess the executable I uploaded might not be fully compatible with your system. You can try the Singularity image (similar to docker but does not require root permission) in the release. There is a brief introduction about Singularity in the README. I created this image on Ubuntu, it's fairly straightforward to use Singularity.
Hi @mourisl — Thank you for taking the time to run that file. I'm on an Ubuntu EC2 machine, but will try both removing the -r
option as well as the singularity container and will report back.
Hi @mourisl ,
I'm running into the same issue using the singularity container:
(base) ubuntu@ip-10-200-0-18:~$ singularity exec trust4-singularity.sif /TRUST4/run-trust4 -1 samples/NC-1-2_1.fq -2 samples/NC-1-2_2.fq -f TRUST4/mouse/mouse_IMGT+C.fa --ref TRUST4/mouse/mouse_IMGT+C.fa -o output/NC-1-2/NC-1-2 -t 16
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = (unset),
LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
[Tue Jul 14 21:41:14 2020] TRUST4 begins.
[Tue Jul 14 21:41:14 2020] SYSTEM CALL: /TRUST4/fastq-extractor -1 samples/NC-1-2_1.fq -2 samples/NC-1-2_2.fq -t 16 -f TRUST4/mouse/mouse_IMGT+C.fa -o output/NC-1-2/NC-1-2_toassemble
[Tue Jul 14 21:41:14 2020] Start to extract candidate reads from read files.
[Tue Jul 14 21:47:01 2020] Finish extracting reads.
[Tue Jul 14 21:47:01 2020] SYSTEM CALL: /TRUST4/trust4 -f TRUST4/mouse/mouse_IMGT+C.fa -o output/NC-1-2/NC-1-2 -t 16 -1 output/NC-1-2/NC-1-2_toassemble_1.fq -2 output/NC-1-2/NC-1-2_toassemble_2.fq
[Tue Jul 14 21:47:02 2020] Read in and count kmers for 100000 reads.
[Tue Jul 14 21:47:05 2020] Read in and count kmers for 200000 reads.
[Tue Jul 14 21:47:07 2020] Read in and count kmers for 300000 reads.
[Tue Jul 14 21:47:10 2020] Read in and count kmers for 400000 reads.
[Tue Jul 14 21:47:22 2020] Found 453867 reads.
[Tue Jul 14 21:47:25 2020] Finish sorting the reads.
[Tue Jul 14 21:47:30 2020] Finish rough annotations.
[Tue Jul 14 21:47:30 2020] Processed 100000 reads (25984 are used for assembly).
[Tue Jul 14 21:47:33 2020] Processed 200000 reads (47874 are used for assembly).
[Tue Jul 14 21:47:41 2020] Processed 300000 reads (59554 are used for assembly).
[Tue Jul 14 21:47:56 2020] Processed 400000 reads (69360 are used for assembly).
[Tue Jul 14 21:48:03 2020] Assembled 71191 reads.
[Tue Jul 14 21:48:03 2020] Try to rescue 9180 reads for assembly.
[Tue Jul 14 21:48:08 2020] Rescued 177 reads.
[Tue Jul 14 21:48:09 2020] Extend assemblies by mate pair information.
[Tue Jul 14 21:48:10 2020] Remove redundant assemblies.
[Tue Jul 14 21:48:11 2020] Finish assembly.
[Tue Jul 14 21:48:11 2020] SYSTEM CALL: /TRUST4/annotator -f TRUST4/mouse/mouse_IMGT+C.fa -a output/NC-1-2/NC-1-2_final.out -t 16 -o output/NC-1-2/NC-1-2 -r output/NC-1-2/NC-1-2_assembled_reads.fa > output/NC-1-2/NC-1-2_annot.fa
[Tue Jul 14 21:48:11 2020] Start to annotate assemblies.
Segmentation fault (core dumped)
system /TRUST4/annotator -f TRUST4/mouse/mouse_IMGT+C.fa -a output/NC-1-2/NC-1-2_final.out -t 16 -o output/NC-1-2/NC-1-2 -r output/NC-1-2/NC-1-2_assembled_reads.fa > output/NC-1-2/NC-1-2_annot.fa failed: 35584 at /TRUST4/run-trust4 line 37.
Trying the singularity container with minimum commands:
(base) ubuntu@ip-10-200-0-18:~$ singularity exec trust4-singularity.sif /TRUST4/annotator -f /TRUST4/mouse/mouse_IMGT+C.fa -a output/NC-1-2/NC-1-2_final.out
[Tue Jul 14 22:27:22 2020] Start to annotate assemblies.
Segmentation fault (core dumped)
Does the container version work for you?
I think I've figured out the bug. Can you pull the GitHub repo again and give it a try? Thanks.
Everything ran without issue — thank you for such a quick patch!
Thanks for sharing the file! Helped A LOT in the debugging.
first, thank you so much for TRUST4 - it's a great tool
I've experienced a similar issue. I'm trying to run TRUST4 over 124 RepSeq, RNA, samples starting from fastq files (pair-end). the run of most samples was successful however it fails in others during the annotation step:
(python3.6) yoav@zelda:~$ /home/zel/yoav/TRUST4/annotator -f /home/zel/yoav/TRUST4/mouse/mouse_IMGT+C.fa -a /home/zel/yoav/GBM_data/TCR_seq/TRUST4_all/7_final.out -t 1 -o /home/zel/yoav/GBM_data/TCR_seq/TRUST4_all/7 -r /home/zel/yoav/GBM_data/TCR_seq/TRUST4_all/7_assembled_reads.fa > /home/zel/yoav/GBM_data/TCR_seq/TRUST4_all/7_annot.fa [Sun Aug 23 12:59:25 2020] Start to annotate assemblies. [Sun Aug 23 13:12:14 2020] Start to realign reads for CDR3 analysis. [Sun Aug 23 13:17:00 2020] Realigned 100000 reads. [Sun Aug 23 13:20:44 2020] Realigned 200000 reads. [Sun Aug 23 13:24:16 2020] Realigned 300000 reads. [Sun Aug 23 13:27:29 2020] Realigned 400000 reads. [Sun Aug 23 13:34:11 2020] Realigned 500000 reads. [Sun Aug 23 13:35:15 2020] Compute CDR3 abundance. double free or corruption (!prev) Aborted (core dumped)
here's a link to the final file in case it helps 7_final.out
thanks in advance for your assistance,
Yoav
Hi, thank you for the work on this really interesting tool!
I have a set of RNA-seq samples that I'm iterating over to call TRUST4 on. What is odd is that while most samples run end-to-end without any issue, several of my samples are failing during the annotation step with a segmentation fault, which is a rather opaque failure state so I don't have much insight into why it is failing. Here is an example of the log output for two samples where one succeeds followed by one that fails during the annotation step:
Succeeds
Fails
TRUST4 Script
These samples are all processed the exact same way, so the intermittent failure is somewhat puzzling to me. I will post an update if I figure out how to get the annotation step to run successfully.
Thank you for your time, John