xunchen85 / ERVcaller

ERVcaller is a tool designed to accurately detect and genotype non-reference unfixed endogenous retroviruses (ERVs) and other transposable elements (TEs) in the human genome using next-generation sequencing (NGS) data. We evaluated the tools using both simulated and real benchmark whole-genome sequencing (WGS) datasets. ERVcaller is capable to accurately detect various TE insertions of any lengths, particularly ERVs. It allows for the use of a TE reference library regardless of sequence complexity, such as the entire RepBase database. It is easy to install and use with command lines.
http://www.uvm.edu/genomics/software/ERVcaller.html
14 stars 4 forks source link

Error of Step 3 #7

Closed zhutao1009 closed 4 years ago

zhutao1009 commented 5 years ago

这是我用的命令: perl /vol6/home/quluj/zt/software/ERVcaller_v1.4/ERVcaller_v1.4.pl \ -i MD4 \ -f .fastq.gz \ -H /vol6/home/quluj/zt/pkduck_ref/PK_ref.fa \ -T /vol6/home/quluj/zt/DUCK/teseq/LTR.fasta \ -t 12 -S 20 -G 但是每次都到第三步就报错,只能得到一个空的vcf文件。 Step 3: Validation...

Converting SAM to BAM file, and then Sort and index the BAM file......

[bam_sort_core] merging from 11 files and 11 in-memory blocks... [bwa_index] Pack FASTA... [bns_fasta2bntseq] Failed to allocate 0 bytes at bntseq.c line 303: Success [E::bwa_idx_load_from_disk] fail to locate the index files [E::bwa_idx_load_from_disk] fail to locate the index files

xunchen85 commented 5 years ago

Please check your indexed human and TE reference files. You can also check if the aligned BAM file is correctly indexed and sorted, if not, you can check your installed SAMtools version which should be higher than v1.5.

zhutao1009 commented 5 years ago

I runned ERVcaller with your test data in different servers, the ERVcaller reported same error and generated an empty vcf file, Maby you should check the file '${input_sampleID}_ERV.output', which didn't be generated as expected. This is my code:

!/bin/bash

conda activate R3 #activate the R3.3.2 envierment perl /home/software/ERVcaller/ERVcaller_v1.4.pl \ -i TE_seq \ -I /home/software/ERVcaller/test/BWA/ \ -f .bam \ -H human.fa \ -T /home/software/ERVcaller/Database/HERVK.fa \ -t 2 -S 20 -G -BWA_MEM \ -l 500 \ -L 100

xunchen85 commented 5 years ago

It works correctly on servers from many other users as well as mine. According to my experience, it usually caused by incorrect inputs. If it is easier for you, can you show the screenshots for: 1) a list of produced intermedia files (using command line: ls -lh); 2) your input paired-end reads (using command line cat); and 3) a list of generated index files for both human.fa and HERVK.fa (using command line: ls -lh).

I would suggest to use full paths and as less parameters as you can for now follow the manual of ERVcaller. For example: _perl /home/software/ERVcaller/ERVcaller_v1.4.pl \ -i TE_seq \ -I /home/software/ERVcaller/test/BWA/ \ -f .bam \ -H full paths/human.fa \ -T /home/software/ERVcaller/Database/HERVK.fa -BWAMEM

zhutao1009 commented 5 years ago

(base) root@zhu-PC:/media/zhu/A64E22B94E228263/clean/humman# ll 总用量 6.9G -rwxrwxrwx 1 zhu zhu 445 9月 4 10:52 ervcaller.sh -rwxrwxrwx 1 zhu zhu 3.1G 9月 2 17:10 human.fa -rwxrwxrwx 1 zhu zhu 22K 9月 4 00:20 human.fa.amb -rwxrwxrwx 1 zhu zhu 83K 9月 4 00:20 human.fa.ann -rwxrwxrwx 1 zhu zhu 3.1G 9月 4 00:19 human.fa.bwt -rwxrwxrwx 1 zhu zhu 781M 9月 4 00:20 human.fa.pac -rwxrwxrwx 1 zhu zhu 11K 9月 4 00:20 nohup.out drwxrwxrwx 1 zhu zhu 48 9月 4 12:29 TE_seq_subgenome drwxrwxrwx 1 zhu zhu 408 9月 4 12:29 TE_seq_temp -rwxrwxrwx 1 zhu zhu 0 9月 4 12:29 TE_seq.vcf ############################################### (base) root@zhu-PC:/media/zhu/A64E22B94E228263/clean/humman/TE_seq_subgenome# ll 总用量 0 ############################################################ (base) root@zhu-PC:/media/zhu/A64E22B94E228263/clean/humman/TE_seq_temp# ll 总用量 4.5K -rwxrwxrwx 1 zhu zhu 0 9月 4 10:54 TE_seq_ERV_1.1fuq -rwxrwxrwx 1 zhu zhu 0 9月 4 10:54 TE_seq_ERV1.bian -rwxrwxrwx 1 zhu zhu 0 9月 4 10:54 TE_seq_ERV_1sf.fuq -rwxrwxrwx 1 zhu zhu 0 9月 4 10:54 TE_seq_ERV_2.1fuq -rwxrwxrwx 1 zhu zhu 0 9月 4 10:54 TE_seq_ERV.fine_mapped -rwxrwxrwx 1 zhu zhu 0 9月 4 10:54 TE_seq_ERV.hf -rwxrwxrwx 1 zhu zhu 0 9月 4 10:54 TE_seq_ERV.output -rwxrwxrwx 1 zhu zhu 925 9月 4 12:28 TE_seq_ERV.output2 -rwxrwxrwx 1 zhu zhu 0 9月 4 12:29 TE_seq_ERV.output2.1 -rwxrwxrwx 1 zhu zhu 0 9月 4 10:54 TE_seq_ERV.TE_f -rwxrwxrwx 1 zhu zhu 0 9月 4 10:54 TE_seq_ERV.TE_f2 -rwxrwxrwx 1 zhu zhu 0 9月 4 10:54 TE_seq_ERV.visualization -rwxrwxrwx 1 zhu zhu 32 9月 4 10:54 TE_seq.type.gz ############################################ (base) root@zhu-PC:/home/software/ERVcaller/Database# ll 总用量 976K -rw-r--r-- 1 root root 814K 9月 1 16:37 ERV_library.fa -rw-r--r-- 1 root root 8.2K 9月 1 16:37 HERVK.fa -rw-r--r-- 1 root root 9 9月 2 17:20 HERVK.fa.amb -rw-r--r-- 1 root root 42 9月 2 17:20 HERVK.fa.ann -rw-r--r-- 1 root root 8.1K 9月 2 17:20 HERVK.fa.bwt -rw-r--r-- 1 root root 2.0K 9月 2 17:20 HERVK.fa.pac -rw-r--r-- 1 root root 4.1K 9月 2 17:20 HERVK.fa.sa -rw-r--r-- 1 root root 115K 9月 1 16:37 Human_TE_library.fa ##################################################### (base) root@zhu-PC:/home/software/ERVcaller/test/BWA# ll 总用量 15M -rw-r--r-- 1 root root 13M 9月 1 16:37 TE_seq.bam -rw-r--r-- 1 root root 1.2M 9月 1 16:37 TE_seq.bam.bai

xunchen85 commented 5 years ago

It looks like it stopped very early due to the alignment or format conversion steps, which mainly use BWA and SAMtools.

Can you specify the full path for your indexed human reference genome and try again? It guess it should be here /media/zhu/A64E22B94E228263/clean/humman/ on your PC.

Let me know if it is not working. With attached log file and similar screenshot.

zhutao1009 commented 5 years ago

It's worked with your test data and part of mine, here is the output and the TSD sequence is too long MDM1.txt

xunchen85 commented 5 years ago

That sounds good that the ERVcaller works. ERVcaller outputs all results, and the users can filter the results based on the reported genotype quality and likelihood. It is very useful when the users are working on a population. For the TSD sequence, based on the literature, long TSD existed. we currently use 500 bp for now to keep high sensitivity and accuracy. we may improve it in our next version.

zhutao1009 commented 5 years ago

Thank you for your work, you helped me a lot.