vpc-ccg / haslr

A fast tool for hybrid genome assembly of long and short reads
GNU General Public License v3.0
74 stars 9 forks source link

Conda version does not produce final assembly #18

Open pydupont opened 3 years ago

pydupont commented 3 years ago

Hi,

Nice work on this! I initially installed the conda version: haslr=0.8a1=py38h1c8e9b9_1 from bioconda. It seems to run fine, there are no error message, it even says that the long read assembly is done but I can't find the assembly file anywhere. I then cloned the git repo and tried that version and it works with the same exact command.

Cheers

pydupont commented 3 years ago

Actually it works on the test ecoli data that you provide but on my data it just generates an empty file for the final assembly. How can I debug what's happening? There are the error message in the output

GabeAl commented 3 years ago

Same. No final assembly, log file is totally clean:

[NOTE] number of threads: 8

[NOTE] loading contig sequences...
       processing file: /mnt/x/assembly/sr_k49_a3.contigs.nooverlap.fa... Done in 0.10 CPU seconds (0.10 real seconds)
       loaded 246679 contigs
       elapsed time 0.10 CPU seconds (0.10 real seconds)

[NOTE] calculating kmer frequency of unique contigs
       mean: 183.21
       elapsed time 0.13 CPU seconds (0.13 real seconds)

[NOTE] loading long read sequences...
       processing file: /mnt/x/assembly/lr25x.fasta... Done in 0.32 CPU seconds (0.33 real seconds)
       loaded 14623 long reads
       elapsed time 0.45 CPU seconds (0.46 real seconds)

[NOTE] loading alignment between contigs and long reads...
       processing file: /mnt/x/assembly/map_contigs_k49_a3_lr25x.paf... Done in 0.64 CPU seconds (0.64 real seconds)
       loaded 38604 alignments
       elapsed time 1.11 CPU seconds (1.11 real seconds)

[NOTE] fixing overlapping alignments...
       elapsed time 1.11 CPU seconds (1.12 real seconds)

[NOTE] building compact long reads...
       elapsed time 1.12 CPU seconds (1.13 real seconds)

[NOTE] building the backbone graph...
       elapsed time 1.14 CPU seconds (1.15 real seconds)

[NOTE] cleaning weak edges...
       removed 229 edges
       elapsed time 1.15 CPU seconds (1.16 real seconds)

[NOTE] cleaning tips...
       removed 14 tips
       elapsed time 1.16 CPU seconds (1.17 real seconds)

[NOTE] cleaning simple bubbles...
       removed 17 simple bubbles
       elapsed time 1.17 CPU seconds (1.19 real seconds)

[NOTE] cleaning super bubbles...
       removed 0 super bubbles
       elapsed time 1.18 CPU seconds (1.20 real seconds)

[NOTE] cleaning small bubbles...
       removed 0 small bubbles
       elapsed time 1.19 CPU seconds (1.20 real seconds)

[NOTE] calculating long read coordinates between anchors...
       elapsed time 1.55 CPU seconds (1.26 real seconds)

[NOTE] calling consensus sequence between anchors...

Likewise, I can confirm that compiling it with git and literally copying the contents of bin/ back to the conda environment makes it work just fine. So if there was some way to replace the broken conda binaries with these ones (or compile statically?), problem solved, methinks.

pdimens commented 3 years ago

I can confirm this problem persists in the conda installation. Here is the tree of the working directory:

├── asm_contigs_k49_a9_lr30x_b500_s2_sim0.85
│   ├── backbone.01.init.gfa
│   ├── backbone.01.init.stat
│   ├── backbone.02.weakEdge.gfa
│   ├── backbone.02.weakEdge.stat
│   ├── backbone.03.tip.gfa
│   ├── backbone.03.tip.log
│   ├── backbone.03.tip.stat
│   ├── backbone.04.simplebubble.gfa
│   ├── backbone.04.simplebubble.log
│   ├── backbone.04.simplebubble.stat
│   ├── backbone.05.superbubble.gfa
│   ├── backbone.05.superbubble.log
│   ├── backbone.05.superbubble.stat
│   ├── backbone.06.smallbubble.gfa
│   ├── backbone.06.smallbubble.log
│   ├── backbone.06.smallbubble.stat
│   ├── backbone.branching.log
│   ├── compact_uniq.txt
│   ├── index.contig
│   ├── index.longread
│   ├── log_consensus.txt
│   └── log_coordinate.txt
├── asm_contigs_k49_a9_lr30x_b500_s2_sim0.85.err
├── asm_contigs_k49_a9_lr30x_b500_s2_sim0.85.out
├── lr30x.fasta
├── map_contigs_k49_a9_lr30x.log
├── map_contigs_k49_a9_lr30x.paf
├── sr.fofn
├── sr_k49_a9.contigs.fa
├── sr_k49_a9.contigs.nooverlap.250.fa
├── sr_k49_a9.contigs.nooverlap.fa
├── sr_k49_a9.h5
├── sr_k49_a9.log
└── sr_k49_a9.unitigs.fa
mictadlo commented 3 years ago

same here

ISonets commented 3 years ago

Aforementioned way to fix it works! TODO: 1.git clone => make

  1. conda install
  2. copy all(haslr.py also) to conda env location, overwrite files
  3. PROFIT Still needs a proper fix though.
salvatierra8 commented 1 year ago

Aforementioned way to fix it works! TODO: 1.git clone => make 2. conda install 3. copy all(haslr.py also) to conda env location, overwrite files 4. PROFIT Still needs a proper fix though.

hope you still remember how to use this... i'm trying to fix this by using ISonets's solution. So by

  1. Installing from source? --- If yes... check
  2. by using: conda install -c bioconda haslr? --- if yes... check
  3. by "all" i understand all the haslr bin folder created by step 1 --- if yes... check
  4. No profit :(