xunchen85 / VIcaller

A software to detect virome-wide integrations
14 stars 5 forks source link

The seq.output of test differ depending on input. #1

Closed byeonggill closed 5 years ago

byeonggill commented 5 years ago

ss.xlsx

This xlsx file consist of three seq.output. Row 2 is your's seq.output. Row 3 is output received seq_WGS.bam file as input Row 4 is output received seq_1.fastq.gz and seq_2.fastq.gz file as input. Why are the three output different?

my cmd: perl VIcaller.pl detect -d WGS -i test/seq_WGS -f .bam -s paired-end my config export PERL5LIB=/etc/perl:/usr/local/lib/x86_64-linux-gnu/perl/5.22.1:/usr/local/share/perl/5.22.1:/usr/lib/x86_64-linux-gnu/perl5/5.22:/usr/share/perl5:/usr/lib/x86_64-linux-gnu/perl/5.22:/usr/share/perl/5.22:/usr/local/lib/site_perl:/usr/lib/x86_64-linux-gnu/perl-base export PATH=$PATH:/data/program/bowtie2/bowtie2-2.2.9/ human_genome = /data/program/VIcaller_v1.1/Database/Human/hg38.fa human_genome_tophat = /data/program/VIcaller_v1.1/Database/Human/hg38.fa virus_genome = /data/program/VIcaller_v1.1/Database/Virus/virus_db_090217.fa virus_taxonomy = /data/program/VIcaller_v1.1/Database/Virus/virus_db_090217.taxonomy virus_list = /data/program/VIcaller_v1.1/Database/Virus/virus_db_090217.virus_list vector_db = /gpfs2/dli5lab/CAVirus/Database/Vector/Vector.fa cell_line = /data/program/VIcaller_v1.1/Database/cell_line.list # bowtie_d = /data/program/bowtie2/bowtie2-2.2.9/ # tophat_d = /data/program/tophat/tophat-2.1.0.Linux_x86_64/ # bwa_d = /data/program/bwa/bwa/ # samtools_d = /data/program/samtools/samtools-1.9/ # repeatmasker_d = /data/program/RepeatMasker/ # meme_d = /data/program/meme/ # NGSQCToolkit_d = /data/program/NGS_QC_Toolkit/ # fastuniq_d = /data/program/FastUniq/ # SE_MEI_d = /data/program/VIcaller_v1.1/Tools/SE-MEI/ # hydra_d = /data/program/Hydra-Version-0.5.3/ # blat_d = /home/hikim/bin/x86_64/ # blastn_d = /data/program/blast/ncbi-blast-2.5.0+

xunchen85 commented 5 years ago

It is mainly because of the different human reference genome builds.

The test output file (Row 2 in the excel) and the input file in BAM format (test_WGS.bam) (Row 3 in the excel) are based on the hg19 reference sequence.

Hopefully it is helpful.

byeonggill commented 5 years ago

thanks for answer!

xunchen85 commented 5 years ago

It results in the mistake of your config file specifying $vector_db.

There should be a space between # and vector_db.

Xun

From: byeonggill Sent: Wednesday, April 10, 2019 3:47 AM To: xunchen85/VIcaller Cc: Xun Chen; Comment Subject: Re: [xunchen85/VIcaller] The seq.output of test differ depending oninput. (#1)

thanks you for answer. I have another question about running tools. I think maybe VIcaller.pl line 396 do not run my system. but, this code run normally, if typing the command manually. This is Vicaller.pl code.

This is my log files.

Line 472: fail to locate the index files. but I have Vector.fa and bwa indexing file. This is my config

This is Vector.fa

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

byeonggill commented 5 years ago

I've just seen the answer. I revised my config($vector_db) It was my mistake.

but I have another problems.

please see the #2

thanks you!