phiweger / uv

Finding prophage regions in bacterial genomes using brute force
22 stars 2 forks source link

freeze at uv:search_protein_db #11

Open i-artjom opened 1 year ago

i-artjom commented 1 year ago

Hi :)

After installing and downloading the test data and databases the workflow seems to freeze at uv:search_protein_db (stops for hours, several times tried).

nextflow run main.nf --standalone --results results --uvdb db --genomes metadata.csv --annotate true
N E X T F L O W  ~  version 22.10.4
Launching `main.nf` [lethal_bartik] DSL2 - revision: af858ed795
executor >  local (8)
[51/3d6887] process > uv:minlen (2)                      [100%] 2 of 2 ✔
[2a/35bd49] process > uv:rename_contigs (2)              [100%] 2 of 2 ✔
[26/070222] process > uv:reading_frames (2)              [100%] 2 of 2 ✔
[5b/9faacf] process > uv:search_protein_db (2)           [  0%] 0 of 2
[-        ] process > uv:segment                         -
[-        ] process > uv:qc                              -
[-        ] process > uv:filter_qc                       -
[-        ] process > annotate:careful_frames            -
[-        ] process > annotate:search_hmms               -
[-        ] process > annotate:interpret_hmms            -
[-        ] process > annotate:collect_hmms              -
[-        ] process > annotate:search_protein_db_careful -
phiweger commented 1 year ago

how much RAM do you have?

phiweger commented 1 year ago

you can pass nextflow run ... --maxram 8 ... to limit this, see

https://github.com/phiweger/uv/blob/main/workflows/processes/uv.nf#L21

the protein search is the (brute force and) most computationally intensive step; on my laptop (16 GB RAM) it completes in about 20 mins for the two test genomes.

i-artjom commented 1 year ago

I let it run overnight and it finished, but I can't say exactly how long it took (maybe when I run it next time). Also running on 16GB RAM.

phiweger commented 1 year ago

doesn't nextflow give you the time it took to run?

i-artjom commented 1 year ago

Unfortunately I can't see it but maybe it's because I'm still getting an error at annotate:collect_hmms (even though it finishes the process annotate:search_tails):

Error executing process > 'annotate:collect_hmms (1)'

Caused by:
  Process `annotate:collect_hmms (1)` terminated with an error exit status (127)

Command executed:

  cat *.bed > all
  bedtools sort -i all > sorted
  /uv/bin/deduplicate_and_rename.py -i sorted -o annotation.bed --names 43b63e6b-323d-473a-8d19-d2d9238d965c.contig_names.txt

Command exit status:
  127

Command output:
  (empty)

Command error:
  .command.sh: line 3: bedtools: command not found
phiweger commented 1 year ago

https://github.com/phiweger/uv/blob/main/workflows/processes/uv.nf#L243

bedtools is missing from env.yml, my bad. thanks for spotting. can you add and rerun?

i-artjom commented 1 year ago

runs smoothly now with the test genomes and finishes in 5mins 🤌

phiweger commented 1 year ago

haha, now I still need to fix that