xunchen85 / ERVcaller

ERVcaller is a tool designed to accurately detect and genotype non-reference unfixed endogenous retroviruses (ERVs) and other transposable elements (TEs) in the human genome using next-generation sequencing (NGS) data. We evaluated the tools using both simulated and real benchmark whole-genome sequencing (WGS) datasets. ERVcaller is capable to accurately detect various TE insertions of any lengths, particularly ERVs. It allows for the use of a TE reference library regardless of sequence complexity, such as the entire RepBase database. It is easy to install and use with command lines.
http://www.uvm.edu/genomics/software/ERVcaller.html
14 stars 4 forks source link

same sample, different result. #9

Closed Leehyeonjin93 closed 4 years ago

Leehyeonjin93 commented 4 years ago

hi, I have human human samples and 2 computers. I used a same sample on both computers. but, results were different! I used same options, programs(samtools, bwa, ERVcaller) version, reference. additionally, I got error message when sample size over 61G.

readline() on closed filehandle TAB at /home/super/Program/ERVcaller/ERVcaller_v1.4.pl line 1376.

This is my command. perl $ERV_path/ERVcaller_v1.4.pl -i $samplename -f .sorted.RGfixed.Rmdup.bam -H $refer -T $ERV_path/Database/Human_TE_library.fa -I $input_path/ -O $output_path/$samplename/ -r 150 -t 12 -S 40 -BWA_MEM -G

xunchen85 commented 4 years ago

Hi,

The results can be slightly different due to the alignment results (BWA) may not be exactly the same for each run. The running script was correct.

I'm wondering if you can provide the complete log file? If the difference is significant, I would expect one of the two runs was not completed.

You may get the error messages if your computers did not have either memory or CPUs.

BTW, Samtools 1.5 can not be used as I indicated in the README file.

Best, Xun

Leehyeonjin93 commented 4 years ago
  1. sorry, I didn't save log file. but some files' size are different.

image

image

I used bwa(0.7.12), samtools(1.9)

  1. This is computer' specification. isn't it enough? image

thanks.

xunchen85 commented 4 years ago

Hi,

The second run did not successfully complete with lots of temp files. It can be either because of the low specifications or not-successfully compiled ERVcaller on your second PC.

BTW, do you have R installed in your second PC?

Best, Xun

Leehyeonjin93 commented 4 years ago

yes, second PC installed R, first too. I will try to compile ERVcaller again. I really want to know why super1 got error message. I got error message both two computers. when sample size over 61G. This are two computers' specification.

image

xunchen85 commented 4 years ago

Hi,

I have checked the script, and the error message is not related to sample size. Because the input of this line is tiny.

The error message tells that the ERVcaller can not load the file with the suffix of ".tab2" which was generated by the R script "Genotype_likelihood.R". Please check if you can run it using R command line: Rscript Scripts/Genotype_likelihood.R ${input_sampleID}.tab ${input_sampleID}.tab2. I also can send you a test data for it if you need.

Best, Xun

Leehyeonjin93 commented 4 years ago

yes, when I got error message at first time, I tried that command. It was worked. but, full perl scripts was not run. so I tried to run process after R script step. But I can't fix perl scripts.

xunchen85 commented 4 years ago

The error message indicated that ERVcaller cannot open ${input_sampleID}.tab file which was not removed by ERVcaller at that time. It may be because of either the sample size over your computer specification or an empty result. Without the log file, it is hard for me to help to debug. Thus, those are my suggestions:

  1. run the test data to make sure the ERVcaller is correctly compiled;

  2. if you can successfully run the test data without the error message, you can also try changing the name of test data to be the same as your input file in a new folder. It can exclude the naming issues after you run it.

  3. if you can successfully run the test data without the error message, it means the issue is from your analyzed data. It is recommended to run your own sample with higher cpu or mem when analyzing samples with higher depth.

Best, Xun

Leehyeonjin93 commented 4 years ago

solution 1,2 was ok. I will give up -G option to run ERV caller. thanks...