xunchen85 / ERVcaller

ERVcaller is a tool designed to accurately detect and genotype non-reference unfixed endogenous retroviruses (ERVs) and other transposable elements (TEs) in the human genome using next-generation sequencing (NGS) data. We evaluated the tools using both simulated and real benchmark whole-genome sequencing (WGS) datasets. ERVcaller is capable to accurately detect various TE insertions of any lengths, particularly ERVs. It allows for the use of a TE reference library regardless of sequence complexity, such as the entire RepBase database. It is easy to install and use with command lines.
http://www.uvm.edu/genomics/software/ERVcaller.html
14 stars 4 forks source link

[E::bwa_idx_load_from_disk] fail to locate the index files #21

Open xxYaaoo opened 1 year ago

xxYaaoo commented 1 year ago

Hello, Dr. Chen~! I met a problem when I tried to run test data. The .err log showed me the detailed infor below: image and here was my command line: image how could I solve these problems? thank you for your help!

xunchen85 commented 1 year ago

Hi,

If you correctly indexed the human genome and TE consensus sequences, the error from the very early bwa alignment step may be because no chimeric/split reads were identified.

may I get a list of intermediate output files through "ls -lh" under your output folder?

Best, Xun

xxYaaoo commented 1 year ago

Thank you for your reply! My output folder: image Besides, I am wondering....what you mean "correctly indexed the human genome and TE consensus sequences".... Is there any step needed to be done about the human genome and TE consensus sequences before running the ERVcaller_v1.4.pl?

xunchen85 commented 1 year ago

yes, when you prepared the reference files before running ERVcaller.pl, you need to run the "bwa index" command for both human and TEs. You could try it first and then let me know if you still have the same issue.

Best, Xun

xxYaaoo commented 1 year ago

Hi, Dr. Chen! I think I forgot to "bwa index hg38.fa" previously. After running this command, I ran the ERVcaller command line again. This is my output folder: (while the .vcf file is still empty...). I feel there might still have some problems... image image Thank you for your help!

xunchen85 commented 1 year ago

Hi,

Have you also indexed your TE consensus sequences? Can you also share the log file and the file sizes under the temp folder?

Xun

xxYaaoo commented 1 year ago

Yes, I have checked my notes that I had indexed the TE consensus sequences before. image This is my temp folder: image My slurm.out file: image Head part of my slurm.err file: image Thank you so much!!!

xunchen85 commented 1 year ago

I can't find any problem with your log and temp files.

Could you try using the BAM file or the TE_seq fastq file as the inputs? (not TE_seq2 which may not contain simulated insertions and used for testing separate FASTQ inputs)

Best, Xun

xxYaaoo commented 1 year ago

YEAH, Dr. Chen!~ I used the BAM file and it seemed like success?! image And I am curious that the slurm.err file will not be empty, even if the running is successful? (I will try to figure out the problems related with .fq.gz files and further run my own data.

xunchen85 commented 1 year ago

Hi,

I am glad that it works!

I don't know what is included in your slurm.err file, but sometimes it is just the log BWA or samtools running which should be fine.

Sure, let me know if you have other questions. As I suggested, TE_seq files contacted the simulated integration sites but TE_seq2 may not, which could be the potential issue.

Best, Xun

xxYaaoo commented 1 year ago

Dear Dr. Chen,

I have run my own data successfully using ERVcaller! You really make my these days! Thank you so much! Appreciate~

Best wishes, Yaaoo