ndaniel / fusioncatcher

Finder of Somatic Fusion Genes in RNA-seq data
GNU General Public License v3.0
142 stars 67 forks source link

why fusioncatcher didn't work until the sencond run? #208

Open windtalker6 opened 1 year ago

windtalker6 commented 1 year ago

I test fusioncatcher using the test fq files:

http://sourceforge.net/projects/fusioncatcher/files/test/reads_1.fq.gz
http://sourceforge.net/projects/fusioncatcher/files/test/reads_2.fq.gz

for the glibc error reason, I can only run it through a singularity image:

singularity exec -B /mnt/lustre/user/wubin/01.Program/Scripts/02.software/Fusioncatcher/fusioncatcher:/mnt,/mnt/lustre/user/wubin/01.Program/Scripts/02.software/Fusioncatcher/fusioncatcher/test/fq_dir:/tmp,/mnt/lustre/user/wubin/01.Program/Scripts/02.software/Fusioncatcher/fusioncatcher/test/output:/opt /mnt/lustre/user/wubin/01.Program/Scripts/02.software/Fusioncatcher/centos7_yum2.simg /mnt/bin/fusioncatcher -d /mnt/data/current -i /tmp -o /opt

then there came the error:

WARNING: Cannot restart automatically because the previous log file '/opt/fusioncatcher.log' cannot be found! The workflow will be restarted from the beginning with step 1! .................... ERROR: The version of the data build does not match the version of this pipeline! Please, run again the 'fusioncatcher-build.py' in order to fix this! ....................

I'm sure I used the very version matching the pipeline, for I installed the data followed the manual:

git clone https://github.com/ndaniel/fusioncatcher
cd fusioncatcher/tools/
./install_tools.sh
cd ../data
./download-human-db.sh

but when I run the command line for the second time, it can run to the end:

singularity exec -B /mnt/lustre/user/wubin/01.Program/Scripts/02.software/Fusioncatcher/fusioncatcher:/mnt,/mnt/lustre/user/wubin/01.Program/Scripts/02.software/Fusioncatcher/fusioncatcher/test/fq_dir:/tmp,/mnt/lustre/user/wubin/01.Program/Scripts/02.software/Fusioncatcher/fusioncatcher/test/output:/opt /mnt/lustre/user/wubin/01.Program/Scripts/02.software/Fusioncatcher/centos7_yum2.simg /mnt/bin/fusioncatcher -d /mnt/data/current -i /tmp -o /opt

however, it reported an error:


Reading... /mnt/data/current/genes_symbols.txt Processing and reading... /opt/reads_filtered_transcriptome_sorted-read_no-offending-reads.map Writing... /opt/candidate_fusion-genes_no-offending-reads.txt Traceback (most recent call last): File "/mnt/bin/find_fusion_genes_map.py", line 335, in data=[line+[hugo[line[0]],hugo[line[1]]] for line in data] KeyError: 'ENSG09000000014'

my question are :

  1. since I followed the manual to install the data, why fusioncatcher tell me it doesn't match?
  2. I can run it to the end by just try a second time, why?
  3. if I deleted all the files in the output directory before the second run, it will report the "ERROR: The version of the data build does not match the version of this pipeline!", just like the first run.
  4. what does the KeyError: 'ENSG09000000014' mean? does it indicate a failure of running fusioncatcher ?
windtalker6 commented 1 year ago

I've got it.

if you follow this to install:

git clone https://github.com/ndaniel/fusioncatcher

cd fusioncatcher/tools/

./install_tools.sh

cd ../data

./download-human-db.sh

the version of fusioncatcher is 1.35 but the version of fusioncatcher-build is 1.33

then fusioncatcher.py will check this and cease to run