simoncchu / GAPPadder

GAPPadder is tool for closing gaps on draft genomes with short sequencing data
27 stars 7 forks source link

After Collect stage, some output files not found #10

Open davidecarlson opened 3 years ago

davidecarlson commented 3 years ago

I ran the Preprocess and Collect steps according the ReadMe with no apparent errors. However, it seems like some expected output was not produced because when I run the Assembly step, I get the following error:

First round assembly and merger...
Start merging...
Traceback (most recent call last):
  File "./main.py", line 283, in <module>
    main_func(scommand,sfconfig)
  File "./main.py", line 274, in main_func
    gap_assembler.assemble_pipeline()
  File "/home/progs/GAPPadder/assemble_gaps.py", line 339, in assemble_pipeline
    id_remain=self.pick_already_constructed(contigs_select, fa_list, sf_picked)
  File "/home/progs/GAPPadder/assemble_gaps.py", line 321, in pick_already_constructed
    m_picked=contigs_select.get_already_picked(sf_picked)
  File "/home/progs/GAPPadder/pick_contigs.py", line 576, in get_already_picked
    with open(sf_picked) as fin_picked:
IOError: [Errno 2] No such file or directory: u'/<path to results dir>/gappadder/results/merged/../picked_seqs.fa'

Here are the commands that I ran:

python ./main.py -c Preprocess -g /<my path>/gappadder/gappadder_config.json
python ./main.py -c Collect -g /<my path>/gappadder/gappadder_config.json
python ./main.py -c Assembly -g /<my path>/gappadder/gappadder_config.json

Any ideas what could be going wrong? Thanks! Dave

simoncchu commented 3 years ago

It looks like an earlier error before the assembly step or the assembly failed. Could you post your config.json file?

davidecarlson commented 3 years ago

Thanks for the response. Here is my config.json file:

{
    "draft_genome": {
        "fa": "/datahome/oenothera/assembly/bionano_results_axel/cur_results_1297259/canu_bionano_scaffolds_and_contigs.fasta"
    },
    "raw_reads": [
            {
                "left": "/datahome/oenothera/genomic/Illumina_PE/elata/HI.0553.002.Index_7.johst_DNA_R1.fastq",
                "right": "/datahome/oenothera/genomic/Illumina_PE/elata/HI.0553.002.Index_7.johst_DNA_R2.fastq"
            },
            {
            "left": "/datahome/oenothera/genomic/Illumina_MP-NEW/elata_MP_nxtrim_R1.mp.fastq",
                "right": "/datahome/oenothera/genomic/Illumina_MP-NEW/elata_MP_nxtrim_R2.mp.fastq"
        }
      ],
    "alignments": [
            {
                "bam": "/datahome/oenothera/assembly/bionano_results_axel/cur_results_1297259/gappadder/processed/elataMP.sorted.markdup.bam",
                "is": "8178",
                "std": "853"
            },
            {
            "bam": "/datahome/oenothera/assembly/bionano_results_axel/cur_results_1297259/gappadder/processed/elataPE.sorted.markdup.bam",
                "is": "282",
                "std": "19"
        }
      ],
    "software_path": {
        "bwa": "bwa",
        "samtools": "samtools",
        "velvet": "/home/progs/velvet",
        "kmc": "kmc",
        "TERefiner": "/home/progs/GAPPadder/TERefiner_1",
        "ContigsMerger": "/home/progs/GAPPadder/ContigsMerger"
    },
    "parameters": {
            "working_folder": "/datahome/oenothera/assembly/bionano_results_axel/cur_results_1297259/gappadder/results",
        "min_gap_size": "50",
        "flank_length": "300",
        "nthreads": "40",
        "verbose": "1"
        },
        "kmer_length": [{
                        "k": 30,
                        "k_velvet": [{
                                "k": 29
                        },
                        {
                                "k": 27
                        }]
                },
                {
                        "k": 40,
                        "k_velvet": [{
                                "k": 39
                        },
                        {
                                "k": 37
                        }]
                },
                {
                        "k": 50,
                        "k_velvet": [{
                                "k": 49
                        },
                        {
                                "k": 47
                        }]
                }]
}

Let me know if you need any additional info. Thanks! Dave

simoncchu commented 3 years ago

The config looks good for me. Would you please try to change velvet and kmc to the path of absolute folder? Like

"velvet": "/gpfs/scratchfs1/chc12015/tools/velvet-master/",
"kmc": "/gpfs/scratchfs1/chc12015/tools/kmc2.3/",
davidecarlson commented 3 years ago

Thanks, Simon. I changed the kmc path in the config file to the absolute path of the folder that contains the kmc binary (the velvet path in the config was was already the absolute path to the directory containing the velvet binaries). I then reran the Preprocess and Collect steps, which once again finished without producing any error messages.

However, when I start the Assembly step it once again fails with the same error:

First round assembly and merger...
Start merging...
Traceback (most recent call last):
  File "./main.py", line 283, in <module>
    main_func(scommand,sfconfig)
  File "./main.py", line 274, in main_func
    gap_assembler.assemble_pipeline()
  File "/home/progs/GAPPadder/assemble_gaps.py", line 339, in assemble_pipeline
    id_remain=self.pick_already_constructed(contigs_select, fa_list, sf_picked)
  File "/home/progs/GAPPadder/assemble_gaps.py", line 321, in pick_already_constructed
    m_picked=contigs_select.get_already_picked(sf_picked)
  File "/home/progs/GAPPadder/pick_contigs.py", line 576, in get_already_picked
    with open(sf_picked) as fin_picked:
IOError: [Errno 2] No such file or directory: u'/datahome/oenothera/assembly/bionano_results_axel/cur_results_1297259/gappadder/results/merged/../picked_seqs.fa'

I should note that the "merged" directory in my results contains nothing but empty subdirectories:

ls -l merged
total 0
drwxrwxr-x. 1 davecarlson davecarlson 0 Dec 16 10:12 both_unmapped
drwxrwxr-x. 1 davecarlson davecarlson 0 Dec 16 13:40 empty_dir
drwxrwxr-x. 1 davecarlson davecarlson 0 Dec 16 10:12 gap_reads
drwxrwxr-x. 1 davecarlson davecarlson 0 Dec 16 13:32 gap_reads_alignment
drwxrwxr-x. 1 davecarlson davecarlson 0 Dec 16 10:12 gap_reads_for_alignment
drwxrwxr-x. 1 davecarlson davecarlson 0 Dec 16 10:12 gap_reads_high_quality
drwxrwxr-x. 1 davecarlson davecarlson 0 Dec 16 13:40 kmc_temp
drwxrwxr-x. 1 davecarlson davecarlson 0 Dec 16 13:40 kmers
drwxrwxr-x. 1 davecarlson davecarlson 0 Dec 16 13:40 temp
drwxrwxr-x. 1 davecarlson davecarlson 0 Dec 16 10:12 unmapped_reads
drwxrwxr-x. 1 davecarlson davecarlson 0 Dec 16 13:40 velvet_temp

Any other suggestions for things I should be changing? Thanks, Dave

simoncchu commented 3 years ago

Could you check whether /home/progs/GAPPadder/ContigsMerger and /home/progs/GAPPadder/TERefiner_1 run properly? Did you compile them or directly use the one contained? On some machines, we need to re-compile them.

davidecarlson commented 3 years ago

Hi Simon,

I used the versions bundled with GAPPadder. It's a little hard to say if they're working properly. Here is the output for ContigsMerger:

Arrange error! 0 6

The output for TERefiner_1:

Please check parameters setting!

Are these the expected output when run with no input?

frihaka commented 3 years ago

Hi Simon,

I have tried to use GAPPadder and I am getting exactly the same issues (program not finishing and outputing empty directories) as brought up by Dave before in this ticket.

Below are the different infos to trace back:

script

#!/bin/bash

#SBATCH --mail-type=end,fail
#SBATCH --job-name="gap"
#SBATCH --nodes=1
#SBATCH --cpus-per-task=12
#SBATCH --time=12:00:00
#SBATCH --mem=32G
#SBATCH --partition=pall
#SBATCH --output=gap_%j.out
#SBATCH --error=gap_%j.err

module add UHTS/Aligner/bwa/0.7.17
module add UHTS/Analysis/samtools/1.10
module add UHTS/Assembler/velvet/1.2.10

# Preprocess the draft genome to get the gap positions and flank regions
python main.py -c Preprocess -g configuration.json

# Collect reads for each gap
python main.py -c Collect -g configuration.json

# Construct the gap sequence and pick the best one:
python main.py -c Assembly -g configuration.json

sdout

samtools view /path/2/align.bam "draft_name" | python collect_reads_for_gaps.py /path/2/gap_positions.txt 30 /path/2/1_is300/ 300 50 250 -
samtools view /path/2/align.bam "draft_name" | python collect_discordant_low_mapq_reads.py /path/2/1_is300/ -
First round assembly and merger...
Start merging...

sderr

Traceback (most recent call last):
  File "main.py", line 283, in <module>
    main_func(scommand,sfconfig)
  File "main.py", line 257, in main_func
    drc.merge_dispatch_reads_for_gaps_v2(left_reads, right_reads)
  File "/path/2/run_multi_threads_discordant.py", line 213, in merge_dispatch_reads_for_gaps_v2
    temp_field=id_fields[0].split("/")
IndexError: list index out of range
Traceback (most recent call last):
  File "main.py", line 283, in <module>
    main_func(scommand,sfconfig)
  File "main.py", line 274, in main_func
    gap_assembler.assemble_pipeline()
  File "/path/2/assemble_gaps.py", line 339, in assemble_pipeline
    id_remain=self.pick_already_constructed(contigs_select, fa_list, sf_picked)
  File "/path/2/assemble_gaps.py", line 321, in pick_already_constructed
    m_picked=contigs_select.get_already_picked(sf_picked)
  File "/path/2/pick_contigs.py", line 576, in get_already_picked
    with open(sf_picked) as fin_picked:
IOError: [Errno 2] No such file or directory: u'/path/2/merged/../picked_seqs.fa'

configuration.json

    "draft_genome": {
        "fa": "/path/2/draft.fasta"
    },
    "raw_reads": [
            {
            "left": "/path/2/reads_1.fastq.gz",
        "right": "/path/2/reads_2.fastq.gz"
        }
      ],
    "alignments": [
            {
            "bam": "/path/2/align.bam",
                "is": "300",
                "std": "50"
        }
      ],
    "software_path": {
        "bwa": "bwa",
        "samtools": "samtools",
        "velvet": "velvet",
        "kmc": "/path/2/KMC/bin/",
        "TERefiner": "/path/2/TERefiner_1",
        "ContigsMerger": "/path/2/ContigsMerger"
    },
    "parameters": {
            "working_folder": "/path/2/dir",
        "min_gap_size": "2",
        "flank_length": "300",
        "nthreads": "12",
        "verbose": "1"
        },
        "kmer_length": [{
                        "k": 30,
                        "k_velvet": [{
                                "k": 29
                        }, 
                        {
                                "k": 27
                        }]
                }, 
                {
                        "k": 40,
                        "k_velvet": [{
                                "k": 39
                        }, 
                        {
                                "k": 37
                        }]
                },
                {
                        "k": 50,
                        "k_velvet": [{
                                "k": 49
                        }, 
                        {
                                "k": 47
                        }]
                }]
}

Would you have an idea of what is happening?

Best

anne-gcd commented 2 years ago

Hello Simon,

I have tried to use GAPPadder as well and I have the same issues (program not finishing and output directories empty) as mentioned above. I have tried to recompile ContigsMerger and TERefiner_1, but it didn't change anything.

Do you have an idea of what is going wrong ?

Thanks, Anne