xunchen85 / ERVcaller

ERVcaller is a tool designed to accurately detect and genotype non-reference unfixed endogenous retroviruses (ERVs) and other transposable elements (TEs) in the human genome using next-generation sequencing (NGS) data. We evaluated the tools using both simulated and real benchmark whole-genome sequencing (WGS) datasets. ERVcaller is capable to accurately detect various TE insertions of any lengths, particularly ERVs. It allows for the use of a TE reference library regardless of sequence complexity, such as the entire RepBase database. It is easy to install and use with command lines.
http://www.uvm.edu/genomics/software/ERVcaller.html
14 stars 4 forks source link

Illegal division by zero #5

Closed zhutao1009 closed 4 years ago

zhutao1009 commented 5 years ago

when @insert_size =0, the script will report errors. Illegal division by zero at /vol6/home/quluj/zt/software/ERVcaller_v1.4/ERVcaller_v1.4.pl line 1623

xunchen85 commented 5 years ago

It is not allow to have insert size = 0, i will further specify it in the updated manual.

Meanwhile, the term of "insert size" we used here included the read length, thus it won't be 0 even you the two paired-end have overlapped sequences. For example, the 100 bp paired-end reads have 30 bp overlapped regions, the insert size should be 170 bp (100+100-30)

zhutao1009 commented 5 years ago

Thanks! What option I should use? --l (length_insertsize) or -L (L_std_insertsize)

xunchen85 commented 5 years ago

Ideally you should use both, -l is the mean value and -L is the std. You can calculate it using many other tools, such as GATK.

If you don't use both of them, ERVcaller can calculate it using implemented tool. let me know if the implemented tool cannot accurately calculate it.

zhutao1009 commented 5 years ago

I have a fasta file that contain 90 TE sequence,do you mean that I should calculating the average TE length and standard deviation of TE length before running ERcaller?

xunchen85 commented 5 years ago

The insert size is a used for the NGS data not the 90 TE reference sequences in FASTA format.

zhutao1009 commented 5 years ago

Sorry, I don't understand. : )

xunchen85 commented 5 years ago

you can check this page about insert size: http://software.broadinstitute.org/software/igv/interpreting_insert_size. You can find more with the key words such as "NGS insert size"

zhutao1009 commented 5 years ago

Thanks!