Open CaroleBelliardo opened 1 month ago
We're delighted that you're using our software. Since my collaborator developed the Singularity and dock versions, I'm having him look into the issue. He'll get back to you in a day or two.
Or could you try installing it according to the following steps? If you encounter any issues, please let me know, and I'll be happy to help directly. Thank you very much!
conda create -n HyLight conda activate HyLight conda install -c bioconda python=3.6 scipy pandas minimap2 bfc fmlrc2 ropebwt2 racon git clone https://github.com/kangxiongbin/HyLight.git cd HyLight sh install.sh
Thanks for your answer. I'm sorry, but I forgot to tell you that the same error message occurs when I run the same command line on a classic laptop machine (without singularity) following your instructions always with your example dataset. Also, I would be interested in skipping all the previous steps and giving long reads and short reads as input for your tool. It could be a real improvement that allows us to easily and quickly use your software's last analysis of our data for a paper in the ongoing writing process. Thanks a lot. Have a nice day.
Hi CaroleBelliardo,
I found the issue. The output folder you provided isn't a full path. Try specifying the full path, like /work/bin/HyLight/example/, instead of just the name of the output folder.
I hope you can get Hylight running smoothly. If you have any other questions, feel free to reach out. Have a nice day!
Best, Xiongbin
Thanks a lot for your help. I Have two other questions :
Hi, thank you for your help. Providing a full path fixed the issues for the first steps, but another error message appeared after the 'assemble long reads with clean graph' step:
[2024-08-11T09:08:43Z INFO fmlrc2_convert] Input parameters (required):
[2024-08-11T09:08:43Z INFO fmlrc2_convert] Input BWT: "stdin"
[2024-08-11T09:08:43Z INFO fmlrc2_convert] Output BWT: "/kwak/hub/25_cbelliardo/MISTIC/1_reads_rawdata/hybrid_assembly/salade_I__test_v3/tmp//comp_msbwt.npy"
[M::main_ropebwt2] inserted 50606576 symbols in 6.834 sec, 18.615 CPU sec
[M::main_ropebwt2] constructed FM-index in 8.485 sec, 19.040 CPU sec
[M::main_ropebwt2] symbol counts: ($, A, C, G, T, N) = (201622, 12056378, 13146208, 13154111, 23199, 12025058)
[M::main] Version: r187
[M::main] CMD: /home/tools/miniconda/bin/ropebwt2 -LR
[M::main] Real time: 11.146 sec; CPU: 20.785 sec
[2024-08-11T09:08:54Z INFO fmlrc::bwt_converter] Converted BWT with symbol counts: [201622, 12056378, 13146208, 13154111, 23199, 12025058]
[2024-08-11T09:08:54Z INFO fmlrc::bwt_converter] RLE-BWT byte length: 9031781
[2024-08-11T09:08:54Z INFO fmlrc2_convert] RLE-BWT conversion complete.
[2024-08-11T09:08:54Z INFO fmlrc2] Input parameters (required):
[2024-08-11T09:08:54Z INFO fmlrc2] BWT: "comp_msbwt.npy"
[2024-08-11T09:08:54Z INFO fmlrc2] Input reads: "/home/tools/HyLight/example/long_reads.fq"
[2024-08-11T09:08:54Z INFO fmlrc2] Output corrected reads: "fmlrc1.fasta"
[2024-08-11T09:08:54Z INFO fmlrc2] Execution Parameters:
[2024-08-11T09:08:54Z INFO fmlrc2] verbose: false
[2024-08-11T09:08:54Z INFO fmlrc2] threads: 30
[2024-08-11T09:08:54Z INFO fmlrc2] cache size: 8
[2024-08-11T09:08:54Z INFO fmlrc2] Correction Parameters:
[2024-08-11T09:08:54Z INFO fmlrc2] reads to correct: [0, 18446744073709551615)
[2024-08-11T09:08:54Z INFO fmlrc2] k-mer sizes: [21, 59]
[2024-08-11T09:08:54Z INFO fmlrc2] abs. mininimum count: 2
[2024-08-11T09:08:54Z INFO fmlrc2] dyn. minimimum fraction: 0.1
[2024-08-11T09:08:54Z INFO fmlrc2] branching factor: 4
[2024-08-11T09:08:54Z INFO fmlrc::bv_bwt] Loading BWT with 9031781 compressed values
[2024-08-11T09:08:54Z INFO fmlrc::bv_bwt] Loaded BWT with symbol counts: [201622, 12056378, 13146208, 13154111, 23199, 12025058]
[2024-08-11T09:08:54Z INFO fmlrc::bv_bwt] Allocating binary vectors...
[2024-08-11T09:08:54Z INFO fmlrc::bv_bwt] Calculating binary vectors...
[2024-08-11T09:08:54Z INFO fmlrc::bv_bwt] Constructing FM-indices...
[2024-08-11T09:08:54Z INFO fmlrc::bv_bwt] Building 8-mer cache...
[2024-08-11T09:08:54Z INFO fmlrc::bv_bwt] Finished BWT initialization.
[2024-08-11T09:08:55Z INFO fmlrc2] Starting read correction processes...
[2024-08-11T09:09:04Z INFO fmlrc2] Finished processing 2412 total reads in range [0, 18446744073709551615)
[2024-08-11T09:09:04Z INFO fmlrc2] Input parameters (required):
[2024-08-11T09:09:04Z INFO fmlrc2] BWT: "comp_msbwt.npy"
[2024-08-11T09:09:04Z INFO fmlrc2] Input reads: "fmlrc1.fasta"
[2024-08-11T09:09:04Z INFO fmlrc2] Output corrected reads: "fmlrc2.fasta"
[2024-08-11T09:09:04Z INFO fmlrc2] Execution Parameters:
[2024-08-11T09:09:04Z INFO fmlrc2] verbose: false
[2024-08-11T09:09:04Z INFO fmlrc2] threads: 30
[2024-08-11T09:09:04Z INFO fmlrc2] cache size: 8
[2024-08-11T09:09:04Z INFO fmlrc2] Correction Parameters:
[2024-08-11T09:09:04Z INFO fmlrc2] reads to correct: [0, 18446744073709551615)
[2024-08-11T09:09:04Z INFO fmlrc2] k-mer sizes: [21, 59]
[2024-08-11T09:09:04Z INFO fmlrc2] abs. mininimum count: 2
[2024-08-11T09:09:04Z INFO fmlrc2] dyn. minimimum fraction: 0.1
[2024-08-11T09:09:04Z INFO fmlrc2] branching factor: 4
[2024-08-11T09:09:04Z INFO fmlrc::bv_bwt] Loading BWT with 9031781 compressed values
[2024-08-11T09:09:04Z INFO fmlrc::bv_bwt] Loaded BWT with symbol counts: [201622, 12056378, 13146208, 13154111, 23199, 12025058]
[2024-08-11T09:09:04Z INFO fmlrc::bv_bwt] Allocating binary vectors...
[2024-08-11T09:09:04Z INFO fmlrc::bv_bwt] Calculating binary vectors...
[2024-08-11T09:09:05Z INFO fmlrc::bv_bwt] Constructing FM-indices...
[2024-08-11T09:09:05Z INFO fmlrc::bv_bwt] Building 8-mer cache...
[2024-08-11T09:09:05Z INFO fmlrc::bv_bwt] Finished BWT initialization.
[2024-08-11T09:09:05Z INFO fmlrc2] Starting read correction processes...
[2024-08-11T09:09:11Z INFO fmlrc2] Finished processing 2412 total reads in range [0, 18446744073709551615)
[2024-08-11T09:09:11Z INFO fmlrc2] Input parameters (required):
[2024-08-11T09:09:11Z INFO fmlrc2] BWT: "comp_msbwt.npy"
[2024-08-11T09:09:11Z INFO fmlrc2] Input reads: "fmlrc2.fasta"
[2024-08-11T09:09:11Z INFO fmlrc2] Output corrected reads: "fmlrc3.fasta"
[2024-08-11T09:09:11Z INFO fmlrc2] Execution Parameters:
[2024-08-11T09:09:11Z INFO fmlrc2] verbose: false
[2024-08-11T09:09:11Z INFO fmlrc2] threads: 30
[2024-08-11T09:09:11Z INFO fmlrc2] cache size: 8
[2024-08-11T09:09:11Z INFO fmlrc2] Correction Parameters:
[2024-08-11T09:09:11Z INFO fmlrc2] reads to correct: [0, 18446744073709551615)
[2024-08-11T09:09:11Z INFO fmlrc2] k-mer sizes: [21, 59]
[2024-08-11T09:09:11Z INFO fmlrc2] abs. mininimum count: 2
[2024-08-11T09:09:11Z INFO fmlrc2] dyn. minimimum fraction: 0.1
[2024-08-11T09:09:11Z INFO fmlrc2] branching factor: 4
[2024-08-11T09:09:11Z INFO fmlrc::bv_bwt] Loading BWT with 9031781 compressed values
[2024-08-11T09:09:11Z INFO fmlrc::bv_bwt] Loaded BWT with symbol counts: [201622, 12056378, 13146208, 13154111, 23199, 12025058]
[2024-08-11T09:09:11Z INFO fmlrc::bv_bwt] Allocating binary vectors...
[2024-08-11T09:09:11Z INFO fmlrc::bv_bwt] Calculating binary vectors...
[2024-08-11T09:09:11Z INFO fmlrc::bv_bwt] Constructing FM-indices...
[2024-08-11T09:09:11Z INFO fmlrc::bv_bwt] Building 8-mer cache...
[2024-08-11T09:09:11Z INFO fmlrc::bv_bwt] Finished BWT initialization.
[2024-08-11T09:09:11Z INFO fmlrc2] Starting read correction processes...
[2024-08-11T09:09:17Z INFO fmlrc2] Finished processing 2412 total reads in range [0, 18446744073709551615)
2024-08-11 10:09:17,809 - /home/tools/HyLight/script/HyLight.py[line:114] - INFO: generate overlap information among long reads
2024-08-11 10:09:18,597 - /home/tools/HyLight/script/HyLight.py[line:122] - INFO: computing all-vs-all long read overlaps and then filter wrong overlap by SNPs and sort overlap by overlap score.
2024-08-11 10:09:36,943 - /home/tools/HyLight/script/HyLight.py[line:132] - INFO: assemble long reads with clean graph
[M::main] ===> Step 1: reading read mappings <===
[M::ma_hit_read::0.009*1.40] read 3881 hits; stored 7762 hits and 1723 sequences (20628149 bp)
[M::main] ===> Step 2: 1-pass (crude) read selection <===
[M::ma_hit_sub::0.010*1.35] 1723 query sequences remain after sub
[M::ma_hit_cut::0.010*1.35] 7704 hits remain after cut
[M::ma_hit_flt::0.010*1.34] 7704 hits remain after filtering; crude coverage after filtering: 2.67
[M::main] ===> Step 3: 2-pass (fine) read selection <===
[M::ma_hit_sub::0.010*1.32] 1717 query sequences remain after sub
[M::ma_hit_cut::0.011*1.32] 7704 hits remain after cut
[M::ma_hit_contained::0.011*1.30] 578 sequences and 1360 hits remain after containment removal
[M::main] ===> Step 4: graph cleaning <===
[M::ma_sg_gen] read 1360 arcs
[M::main] ===> Step 4.1: transitive reduction <===
[M::asg_arc_del_trans] transitively reduced 474 arcs
[M::asg_arc_del_multi] removed 0 multi-arcs
[M::asg_arc_del_asymm] removed 0 asymmetric arcs
[M::main] ===> Step 4.2: initial tip cutting and bubble popping <===
[M::asg_cut_tip] cut 39 tips
[M::asg_pop_bubble] popped 0 bubbles and trimmed 0 tips
[M::main] ===> Step 4.3: cutting short overlaps (1 rounds in total) <===
[M::asg_arc_del_short] removed 0 short overlaps
[M::main] ===> Step 4.4: removing short internal sequences and bi-loops <===
[M::asg_cut_internal] cut 0 internal sequences
[M::asg_cut_biloop] cut 0 small bi-loops
[M::asg_cut_tip] cut 0 tips
[M::asg_pop_bubble] popped 0 bubbles and trimmed 0 tips
[M::main] ===> Step 4.5: aggressively cutting short overlaps <===
[M::asg_arc_del_short] removed 0 short overlaps
[M::main] ===> Step 5: generating unitigs <===
[M::main] Version: 0.3-r179
[M::main] CMD: /home/tools/HyLight/tools/miniasm/miniasm -d 10000 -n 1 -e 1 -c 1 -f /kwak/hub/25_cbelliardo/MISTIC/1_reads_rawdata/hybrid_assembly/salade_I__test_v3/1.split_fastx/s1.fa /kwak/hub/25_cbelliardo/MISTIC/1_reads_rawdata/hybrid_assembly/salade_I__test_v3/2.overlap/s1_s1.paf
[M::main] Real time: 0.071 sec; CPU: 0.071 sec
2024-08-11 10:09:37,142 - /home/tools/HyLight/script/HyLight.py[line:144] - INFO: Start assemble short reads
2024-08-11 10:09:37,142 - /home/tools/HyLight/script/HyLight.py[line:146] - INFO: generate overlap between reads and temporary reference
sh: 1: racon: not found
[M::main] ===> Step 1: reading read mappings <===
[M::ma_hit_read::0.004*0.74] read 697 hits; stored 1188 hits and 321 sequences (2539191 bp)
[M::main] ===> Step 2: 1-pass (crude) read selection <===
[M::ma_hit_sub::0.004*0.75] 321 query sequences remain after sub
[M::ma_hit_cut::0.004*0.75] 1150 hits remain after cut
[M::ma_hit_flt::0.004*0.76] 1150 hits remain after filtering; crude coverage after filtering: 2.44
[M::main] ===> Step 3: 2-pass (fine) read selection <===
[M::ma_hit_sub::0.004*0.76] 318 query sequences remain after sub
[M::ma_hit_cut::0.004*0.76] 1150 hits remain after cut
[M::ma_hit_contained::0.004*0.77] 102 sequences and 162 hits remain after containment removal
[M::main] ===> Step 4: graph cleaning <===
[M::ma_sg_gen] read 162 arcs
[M::main] ===> Step 4.1: transitive reduction <===
[M::asg_arc_del_trans] transitively reduced 42 arcs
[M::asg_arc_del_multi] removed 0 multi-arcs
[M::asg_arc_del_asymm] removed 0 asymmetric arcs
[M::main] ===> Step 4.2: initial tip cutting and bubble popping <===
[M::asg_cut_tip] cut 17 tips
[M::asg_pop_bubble] popped 0 bubbles and trimmed 0 tips
[M::main] ===> Step 4.3: cutting short overlaps (1 rounds in total) <===
[M::asg_arc_del_short] removed 0 short overlaps
[M::main] ===> Step 4.4: removing short internal sequences and bi-loops <===
[M::asg_cut_internal] cut 0 internal sequences
[M::asg_cut_biloop] cut 0 small bi-loops
[M::asg_cut_tip] cut 0 tips
[M::asg_pop_bubble] popped 0 bubbles and trimmed 0 tips
[M::main] ===> Step 4.5: aggressively cutting short overlaps <===
[M::asg_arc_del_short] removed 0 short overlaps
[M::main] ===> Step 5: generating unitigs <===
[M::main] Version: 0.3-r179
[M::main] CMD: /home/tools/HyLight/tools/miniasm/miniasm -d 10000 -n 1 -e 1 -c 1 -f /kwak/hub/25_cbelliardo/MISTIC/1_reads_rawdata/hybrid_assembly/salade_I__test_v3/1.split_fastx/s1.fa /kwak/hub/25_cbelliardo/MISTIC/1_reads_rawdata/hybrid_assembly/salade_I__test_v3/tmp//ov_long_remain.paf
[M::main] Real time: 0.046 sec; CPU: 0.045 sec
sh: 1: racon: not found
[M::main] ===> Step 1: reading read mappings <===
[M::ma_hit_read::0.003*0.67] read 146 hits; stored 250 hits and 94 sequences (736377 bp)
[M::main] ===> Step 2: 1-pass (crude) read selection <===
[M::ma_hit_sub::0.003*0.68] 94 query sequences remain after sub
[M::ma_hit_cut::0.003*0.68] 244 hits remain after cut
[M::ma_hit_flt::0.003*0.69] 244 hits remain after filtering; crude coverage after filtering: 2.11
[M::main] ===> Step 3: 2-pass (fine) read selection <===
[M::ma_hit_sub::0.003*0.69] 93 query sequences remain after sub
[M::ma_hit_cut::0.003*0.69] 244 hits remain after cut
[M::ma_hit_contained::0.003*0.69] 16 sequences and 0 hits remain after containment removal
[M::main] ===> Step 4: graph cleaning <===
[M::ma_sg_gen] read 0 arcs
[M::main] ===> Step 4.1: transitive reduction <===
[M::asg_arc_del_trans] transitively reduced 0 arcs
[M::main] ===> Step 4.2: initial tip cutting and bubble popping <===
[M::asg_cut_tip] cut 16 tips
[M::asg_arc_del_multi] removed 0 multi-arcs
[M::asg_arc_del_asymm] removed 0 asymmetric arcs
[M::asg_pop_bubble] popped 0 bubbles and trimmed 0 tips
[M::main] ===> Step 4.3: cutting short overlaps (1 rounds in total) <===
[M::asg_arc_del_short] removed 0 short overlaps
[M::main] ===> Step 4.4: removing short internal sequences and bi-loops <===
[M::asg_cut_internal] cut 0 internal sequences
[M::asg_cut_biloop] cut 0 small bi-loops
[M::asg_cut_tip] cut 0 tips
[M::asg_pop_bubble] popped 0 bubbles and trimmed 0 tips
[M::main] ===> Step 4.5: aggressively cutting short overlaps <===
[M::asg_arc_del_short] removed 0 short overlaps
[M::main] ===> Step 5: generating unitigs <===
[M::main] Version: 0.3-r179
[M::main] CMD: /home/tools/HyLight/tools/miniasm/miniasm -d 10000 -n 1 -e 1 -c 1 -f /kwak/hub/25_cbelliardo/MISTIC/1_reads_rawdata/hybrid_assembly/salade_I__test_v3/1.split_fastx/s1.fa /kwak/hub/25_cbelliardo/MISTIC/1_reads_rawdata/hybrid_assembly/salade_I__test_v3/tmp//ov_long_remain.paf
[M::main] Real time: 0.044 sec; CPU: 0.041 sec
sort: cannot read: 'sub256*_sorted_overlap.paf': No such file or directory
rm: cannot remove 'sub256*': No such file or directory
sh: 1: racon: not found
log file contain : ```2024-08-07 00:48:07,241 - /home/tools/HyLight/script/HyLight.py[line:100] - INFO: correcting long reads with short reads 2024-08-07 01:31:10,225 - /home/tools/HyLight/script/HyLight.py[line:114] - INFO: generate overlap information among long reads 2024-08-07 01:35:38,301 - /home/tools/HyLight/script/HyLight.py[line:122] - INFO: computing all-vs-all long read overlaps and then filter wrong overlap by SNPs and sort overlap by overlap score. 2024-08-07 01:35:45,015 - /home/tools/HyLight/script/HyLight.py[line:132] - INFO: assemble long reads with clean graph
Thank again for your help,
Carole
Could you check if you have installed racon? The report indicates that the Racon software wasn't successfully installed. I recommend using conda to install Racon; it's quick and efficient.
Best, Xiongbin
Thank you again for your help. Some error message persist :
Error executing the command: cat /kwak/hub/25_cbelliardo/MISTIC/1_reads_rawdata/hybrid_assembly/salade_I__test_v3/tmp//fq_15000/*/contigs.fasta > /kwak/hub/25_cbelliardo/MISTIC/1_reads_rawdata/hybrid_assembly/salade_I__test_v3/tmp//all.contigs_15000.fasta
cat: '/kwak/hub/25_cbelliardo/MISTIC/1_reads_rawdata/hybrid_assembly/salade_I__test_v3/tmp//fq_15000/*/contigs.fasta': No such file or directory
but 'long_con_polished.fa' is not empty anymore.
Hi Kangxiongbin,
Thank you for developing this impressive software. Utilizing both short and long reads for metagenomic assembly is a fantastic approach! However, I encountered some issues while using HyLight on our data as well as the provided "example" dataset. Both attempts resulted in the same error message. After following the manual installation instructions and running the tool on the "example" dataset, I encountered multiple errors related to missing files and directories, culminating in a FileNotFoundError. Below are the details of the errors and the setup.
Here, the Singularity source code for install HyLight :
Slurm script for running HiLight on example dataset :
Error message is :
The log file contains:
Thank you so much for your help in resolving this issue, I am available for more information about the run. Best regards, Carole