xunchen85 / ERVcaller

ERVcaller is a tool designed to accurately detect and genotype non-reference unfixed endogenous retroviruses (ERVs) and other transposable elements (TEs) in the human genome using next-generation sequencing (NGS) data. We evaluated the tools using both simulated and real benchmark whole-genome sequencing (WGS) datasets. ERVcaller is capable to accurately detect various TE insertions of any lengths, particularly ERVs. It allows for the use of a TE reference library regardless of sequence complexity, such as the entire RepBase database. It is easy to install and use with command lines.
http://www.uvm.edu/genomics/software/ERVcaller.html
14 stars 4 forks source link

About required programs #11

Closed oghzzang closed 4 years ago

oghzzang commented 4 years ago

Dear Xun Chen,

Hi. I'm Oh.

I have three questions.

1) In your programs, there is no process using "Hydra". Is it right?

2) Where can I find "TE_consensus.fa" in your user manual?

3) In my ERVcaller-1.4/Database folder, there are three fasta files (ERV_library.fa, HERVK.fa, Human_TE_library.fa).

I want to use all fasta files in database (ERV_library.fa, HERVK.fa, Human_TE_library.fa).

Can I use a merged, and indexed fasta file using three fasta file?

Like this,

$cat ERV_library.fa HERVK.fa Human_TE_library.fa > All.fa $bwa index All.fa $perl ERVcaller_v1.4.pl -i TE_seq -f .bam -H hg38.fa -T All.fa

Thanks.

xunchen85 commented 4 years ago

Hi Oh,

  1. Yes in the latest version, I have replaced it by using Samtools. I have removed the corresponding comments in the README file.

  2. I have uploaded it to the Database account.

  3. It depends on your questions. There will be some redundancy in your database if you merge all of them. Especially for the HERVK sequences.

oghzzang commented 4 years ago

Hi Xun.

Thanks for reply.

I understood Q1, and Q2 due to your reply.

Then, I have remaining four questions about Q3.

  1. I have four reference files from your program ; ERV_library.fa, HERVK.fa, Human_TE_library.fa, TE_consensus.fa

I want to detect and genotype "polymorphic endogenous retrovirus and other transposable element insertion" like your paper.

Thus, I want to find four types of polymorphic mobile elements in the human genome (All, LINE, SVA, ERV).

In this case, what is the best reference for me?

(Maybe, TE_consensus.fa or Human_TE_library or all of them?)

  1. If you recommend to use two reference files (TE_consensus.fa, Human_TE_library), can I merge and use them following command lines?

$cat TE_consensus.fa Human_TE_library.fa > All.fa $bwa index All.fa $perl ERVcaller_v1.4.pl -i TE_seq -f .bam -H hg38.fa -T All.fa

  1. TE_consense.fa is derived from human being? or a consensus of all species including human?

  2. In your ERVcaller paper (10.1093/bioinformatics/btz205), did you use only TE_consensus.fa for TE reference?

Many thanks

Best,

Oh.

xunchen85 commented 4 years ago
  1. If you want to detect polymorphic ALU, LINE1, SVA, and ERV, like most tools, it is suggested to use the consensus FASTA sequences.
  2. However, if you want, you can always merge references like what you listed.
  3. Yes, it is derived from human beings.
  4. I did run ERVcaller with various TE references instead of only using the TE_consensus.fa.

Best, Xun