kangxiongbin / StrainXpress

StrainXpress is a de novo assembly method which base on overlap-layout-consensus (OLC) paradigm and can fast and accurately assemble high complexity metagenome sequencing data at strain resolution.
GNU General Public License v3.0
13 stars 2 forks source link

No output files #2

Closed jsgounot closed 2 years ago

jsgounot commented 2 years ago

Hi,

I thought with my previous issue that software worked well with my data but actually there are still problems. Here is my full output:

pid 10779's current affinity mask: ffff
pid 10779's new affinity mask: ff
begin...
##################################################
the 1/1 part start...
this is the: 0 for 100w lines
this is the: 1000000 for 100w lines
this is the: 2000000 for 100w lines
this is the: 3000000 for 100w lines
this is the: 4000000 for 100w lines
this is the: 5000000 for 100w lines
this is the: 6000000 for 100w lines
this is the: 7000000 for 100w lines
this is the: 8000000 for 100w lines
this is the: 9000000 for 100w lines
this is the: 10000000 for 100w lines
this is the: 11000000 for 100w lines
this is the: 12000000 for 100w lines
this is the: 13000000 for 100w lines
this is the: 14000000 for 100w lines
this is the: 15000000 for 100w lines
this is the: 16000000 for 100w lines
this is the: 17000000 for 100w lines
this is the: 18000000 for 100w lines
this is the: 19000000 for 100w lines
this is the: 20000000 for 100w lines
this is the: 21000000 for 100w lines
the 1/1 part finished...

##################################################
[M::mm_idx_gen::1.285*1.00] collected minimizers
[M::mm_idx_gen::1.373*1.63] sorted minimizers
[M::main::1.373*1.63] loaded/built the index for 108912 target sequence(s)
[M::mm_mapopt_update::1.408*1.61] mid_occ = 50
[M::mm_idx_stat] kmer size: 21; skip: 11; is_hpc: 0; #seq: 108912
[M::mm_idx_stat::1.428*1.60] distinct minimizers: 1429398 (20.63% are singletons); average occurrences: 4.557; average spacing: 6.420; total length: 41822356
[M::worker_pipeline::10.445*2.89] mapped 108912 sequences
[M::main] Version: 2.21-r1071
[M::main] CMD: minimap2 -t 16 --sr -X -c -k 21 -w 11 -s 60 -m 30 -n 2 -r 0 -A 4 -B 2 --end-bonus=100 ../contigs_b.fastq ../contigs_b.fastq
[M::main] Real time: 10.461 sec; CPU: 30.220 sec; Peak RSS: 0.498 GB
pipeline_per_stage.py
Traceback (most recent call last):
  File "/home/ubuntu/strainxpress/StrainXpress/scripts/pipeline_per_stage.v3.py", line 666, in <module>
    sys.exit(main())
  File "/home/ubuntu/strainxpress/StrainXpress/scripts/pipeline_per_stage.v3.py", line 164, in main
    run_first_it_merge(args.fastq, args.overlaps, args.edge_threshold, args.min_overlap_perc, min_overlap_len, args.merge_contigs, first_it)
  File "/home/ubuntu/strainxpress/StrainXpress/scripts/pipeline_per_stage.v3.py", line 246, in run_first_it_merge
    "--ignore_inclusions=%s" % remove_inclusions
  File "/home/ubuntu/miniconda3/envs/strainxpress/lib/python3.6/subprocess.py", line 306, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/home/ubuntu/miniconda3/envs/strainxpress/lib/python3.6/subprocess.py", line 287, in call
    with Popen(*popenargs, **kwargs) as p:
  File "/home/ubuntu/miniconda3/envs/strainxpress/lib/python3.6/subprocess.py", line 729, in __init__
    restore_signals, start_new_session)
  File "/home/ubuntu/miniconda3/envs/strainxpress/lib/python3.6/subprocess.py", line 1364, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '/prj/metastrain/software/conda/envs/hap/opt/haploconduct-0.2.1/bin/ViralQuasispecies': '/prj/metastrain/software/conda/envs/hap/opt/haploconduct-0.2.1/bin/ViralQuasispecies'
Traceback (most recent call last):
  File "/home/ubuntu/strainxpress/StrainXpress/scripts/fastq2fasta.py", line 43, in <module>
    sys.exit(main())
  File "/home/ubuntu/strainxpress/StrainXpress/scripts/fastq2fasta.py", line 32, in main
    with open(infile, 'r') as f1:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/volume1/analysis/synthetic/test/strainxpress/stageb/singles.fastq'
successfully execute: split input/ecoli.interleaved.fq -l 2707768 -d -a 2 sub
successfully execute: cat cmd_overlap.sh | xargs -i -P 16 bash -c "{}";
successfully execute: for X in sub*.map; do sort  -k3 -nr < $X > sorted-$X; done;
successfully execute: sort -k3 -nr -m sorted-sub*.map > all_reads_sort.map;
successfully execute: rm *sub*;
successfully execute: python /home/ubuntu/strainxpress/StrainXpress/scripts/get_readnames.py input/ecoli.interleaved.fq readnames.txt
successfully execute: python /home/ubuntu/strainxpress/StrainXpress/scripts/bin_pointer_limited_filechunks_shortpath.py all_reads_sort.map readnames.txt 15000 strainxpress 16
successfully execute: python /home/ubuntu/strainxpress/StrainXpress/scripts/getclusters.py strainxpress_max15000_final 16
successfully execute: python /home/ubuntu/strainxpress/StrainXpress/scripts/get_fq_cluster.py strainxpress_max15000_final_clusters_grouped.json input/ecoli.interleaved.fq /mnt/volume1/analysis/synthetic/test/strainxpress/fq_15000
successfully execute: rm -rf Chunkfile*; rm strainxpress_max15000_final_clustersizes.json strainxpress_max15000_final_clusters_unchained.json strainxpress_max15000_final_clusters.json
successfully execute: cat cmd_polyte.sh | xargs -i -P 16 bash -c "{}";
successfully execute: cat /mnt/volume1/analysis/synthetic/test/strainxpress/fq_15000/*/contigs.fasta > all.contigs_15000.fasta
successfully execute: mkdir -p stageb
successfully execute: cd stageb; minimap2 -t 16 --sr -X -c -k 21 -w 11 -s 60 -m 30 -n 2 -r 0      -A 4 -B 2 --end-bonus=100 ../contigs_b.fastq ../contigs_b.fastq | python3 /home/ubuntu/strainxpress/StrainXpress/scripts/filter_trans_ovlp_inline_v3.py       -len 100 -iden 0.99 -oh 2 -sfo > sfoverlaps.out;
successfully execute: cd stageb; python3 /home/ubuntu/strainxpress/StrainXpress/scripts/sfo2overlaps.py --in sfoverlaps.out     --out sfoverlap.out.savage --num_singles 108912 --num_pairs 0; mkdir -p fastq;    cp ../contigs_b.fastq ./fastq/singles.fastq;
successfully execute: cd stageb; python3 /home/ubuntu/strainxpress/StrainXpress/scripts/pipeline_per_stage.v3.py --no_error_correction --remove_branches true      --stage b --min_overlap_len 100 --min_overlap_perc 0 --edge_threshold 1 --overlaps ./sfoverlap.out.savage      --fastq ./fastq --max_tip_len 1000 --num_threads 16; python3 /home/ubuntu/strainxpress/StrainXpress/scripts/fastq2fasta.py ./singles.fastq      ./contigs.stage_b.fasta;

I don't know exactly what is the output file here but I don't find contigs.stage_b.fasta in my folder. Moreover, it looks like there is a hard coded path in the software which produce an error at one point during the process.

kangxiongbin commented 2 years ago

Hi, StrainXpress clusters reads and then assemble read in per cluster. To extend short contigs, I add a global assembly step. It seems that the global assembly step encounters a error.

Could you enter the stageb folder and execute:

minimap2 -t 16 --sr -X -c -k 21 -w 11 -s 60 -m 30 -n 2 -r 0 -A 4 -B 2 --end-bonus=100 ../contigs_b.fastq ../contigs_b.fastq | python /home/ubuntu/strainxpress/StrainXpress/scripts/filter_trans_ovlp_inline_v3.py -len 100 -iden 0.99 -oh 2 -sfo > sfoverlaps.out;

python /home/ubuntu/strainxpress/StrainXpress/scripts/sfo2overlaps.py --in sfoverlaps.out --out sfoverlap.out.savage --num_singles 108912 --num_pairs 0; mkdir -p fastq; cp ../contigs_b.fastq ./fastq/singles.fastq;

python /home/ubuntu/strainxpress/StrainXpress/scripts/pipeline_per_stage.v3.py --no_error_correction --remove_branches true --stage b --min_overlap_len 100 --min_overlap_perc 0 --edge_threshold 1 --overlaps ./sfoverlap.out.savage --fastq ./fastq --max_tip_len 1000 --num_threads 16;

python /home/ubuntu/strainxpress/StrainXpress/scripts/fastq2fasta.py ./singles.fastq ./contigs.stage_b.fasta;

I made a mistake in former code with python3, now I correct it with python. If above command work well, you can directly replace python3 with python in the script file strainxpress.py or you can download it again (I corrected it).

I'm very happy that you can use my script and let me find some small bugs.

jsgounot commented 2 years ago

Hi,

I still have an error when using one of the command. I guess one path is hardcoded?

python /home/ubuntu/strainxpress/StrainXpress/scripts/pipeline_per_stage.v3.py --no_error_correction --remove_branches true --stage b --min_overlap_len 100 --min_overlap_perc 0 --edge_threshold 1 --overlaps ./sfoverlap.out.savage --fastq ./fastq --max_tip_len 1000 --num_threads 16;
pipeline_per_stage.py
Traceback (most recent call last):
  File "/home/ubuntu/strainxpress/StrainXpress/scripts/pipeline_per_stage.v3.py", line 666, in <module>
    sys.exit(main())
  File "/home/ubuntu/strainxpress/StrainXpress/scripts/pipeline_per_stage.v3.py", line 164, in main
    run_first_it_merge(args.fastq, args.overlaps, args.edge_threshold, args.min_overlap_perc, min_overlap_len, args.merge_contigs, first_it)
  File "/home/ubuntu/strainxpress/StrainXpress/scripts/pipeline_per_stage.v3.py", line 246, in run_first_it_merge
    "--ignore_inclusions=%s" % remove_inclusions
  File "/home/ubuntu/miniconda3/envs/strainxpress/lib/python3.6/subprocess.py", line 306, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/home/ubuntu/miniconda3/envs/strainxpress/lib/python3.6/subprocess.py", line 287, in call
    with Popen(*popenargs, **kwargs) as p:
  File "/home/ubuntu/miniconda3/envs/strainxpress/lib/python3.6/subprocess.py", line 729, in __init__
    restore_signals, start_new_session)
  File "/home/ubuntu/miniconda3/envs/strainxpress/lib/python3.6/subprocess.py", line 1364, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '/prj/metastrain/software/conda/envs/hap/opt/haploconduct-0.2.1/bin/ViralQuasispecies': '/prj/metastrain/software/conda/envs/hap/opt/haploconduct-0.2.1/bin/ViralQuasispecies'
jsgounot commented 2 years ago

Sorry I did not meant to close the issue.

kangxiongbin commented 2 years ago

Hi, Yese the path is wrong. You can correct it in pipeline_per_stage.v3.py with below code or download new code:

base_path3 = os.path.split(os.path.realpath(file))[0] viralquasispecies = base_path3[:-7]+"tools/HaploConduct/bin/ViralQuasispecies"

I very appreciate that you can report these issues. I think after fixing the bug, you would obtain the final assembly result.

Best, Xiongbin

jsgounot commented 2 years ago

Hi, I obtained the contigs with the last version, thanks for your help.

kangxiongbin commented 2 years ago

Hi, I obtained the contigs with the last version, thanks for your help.

Great! Have a nice day!