katerinakazantseva / strainy

Graph-based assembly phasing
Other
66 stars 5 forks source link

FASTA to GFA #84

Closed juanjo255 closed 1 month ago

juanjo255 commented 1 month ago

Hello developers!

Thanks for this great work.

Due to how Strainy works, I find attractive to try to use it for haplotype assembly of human mitochondrial genome, do you think it might work?

Anyways, when I tried to use it I am facing problems when converting from fasta to gfa it always fail with this:

[2024-10-17 19:18:46] [Root] INFO:  Checking which sequences need to be phased
[2024-10-17 19:18:46] [Root] INFO:  0/0 unitigs will NOT be phased.
Using processor(s):  Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz
[2024-10-17 19:18:46] [Root] INFO:  Starting phasing
[2024-10-17 19:18:46] [Root] INFO:  CMD: --unitig-split-length 0 -t 32 --min-unitig-coverage 20 --min-unitig-length 0.05 -m nano --gfa /home/jjpiconc/COL-HUMAN-PROJECT/chrMT.gfa --stage phase --fastq /home/jjpiconc/COL-HUMAN-PROJECT/MT/MT_20240815/reads_MT_20240815.fastq --bam /home/jjpiconc/COL-HUMAN-PROJECT/MT/MT_20240815/aln_human_CHM13_20240815.onlyMT.sorted.bam --snp /home/jjpiconc/COL-HUMAN-PROJECT/MT/MT_20240815/VariantCall/gatk_mutect2/mitochondria_20240815.filtered.vcf.gz -o /home/jjpiconc/COL-HUMAN-PROJECT/MT/MT_20240815/phasin --debug
[2024-10-17 19:18:46] [Root] INFO:  Total number of key hits and misses for consensus computation:
[2024-10-17 19:18:46] [Root] INFO:   H:0, M:0
[2024-10-17 19:18:46] [Root] INFO:  Position hit/miss
[2024-10-17 19:18:46] [Root] INFO:   H:0, M:0
[2024-10-17 19:18:46] [Root] INFO:  Alignment cache hit/miss
[2024-10-17 19:18:46] [Root] INFO:   H:0, M:0
[2024-10-17 19:18:46] [Root] INFO:  Creating phased bam
[E::hts_open_format] Failed to open file "/home/jjpiconc/COL-HUMAN-PROJECT/MT/MT_20240815/phasin/intermediate/bam/coloredSAM.sam" : No such file or directory

I tried to add a fasta and I tried converting using minigraph chrMT.fna MT/MT_20240815/reads_MT_20240815.fastq > chrMT.gfa, but when I check for the preprocessing_data folder, it always shows an empty gfa_converted.fasta. Maybe this is the cause of the program failing? or it just cannot work in this context?

I am using my own VCF file and BAM file.

Here the command:

strainy.py         --unitig-split-length 0 -t 32 --min-unitig-coverage 20 --min-unitig-length 0.05 -m nano         --gfa ~/COL-HUMAN-PROJECT/chrMT.gfa --stage phase         --fastq ~/COL-HUMAN-PROJECT/MT/MT_20240815/reads_MT_20240815.fastq         --bam ~/COL-HUMAN-PROJECT/MT/MT_20240815/aln_human_CHM13_20240815.onlyMT.sorted.bam         --snp ~/COL-HUMAN-PROJECT/MT/MT_20240815/VariantCall/gatk_mutect2/mitochondria_20240815.filtered.vcf.gz        -o ~/COL-HUMAN-PROJECT/MT/MT_20240815/phasin --debug

Thank you very much for the help,

Juan

katerinakazantseva commented 1 month ago

Hi Juan!

Thanks for your interest in Strainy! It looks like the problem is with the gfa file, but we just added support for input fasta, please try doing a pull and running Strainy with the --fasta_ref parameter (gfa is not needed).

Katya

juanjo255 commented 1 month ago

Hello @katerinakazantseva,

Thanks for the reply!

It worked! unfortunately, I'm facing another problem, I think that maybe it has to do with this issue #75. I tried downgrading to 3.10, but it failed using 10 threads.

The error:

ERROR:  Worker thread exception! Cannot save file into a non-existent directory: '/home/jjpiconc/COL-HUMAN-PROJECT/MT/MT_20240815/phasing/strainy_out/intermediate/adj_M'
Traceback (most recent call last):
  File "/home/jjpiconc/COL-HUMAN-PROJECT/MT/MT_20240815/phasing/strainy/strainy/phase.py", line 30, in _thread_fun
    cluster(i, shared_flye_consensus)
  File "/home/jjpiconc/COL-HUMAN-PROJECT/MT/MT_20240815/phasing/strainy/strainy/clustering/cluster.py", line 106, in cluster
    m.to_csv("%s/adj_M/adj_M_%s_%s_%s.csv" % (StRainyArgs().output_intermediate, edge, I, StRainyArgs().AF))
  File "/home/jjpiconc/.conda/envs/strainy/lib/python3.10/site-packages/pandas/util/_decorators.py", line 333, in wrapper
    return func(*args, **kwargs)
  File "/home/jjpiconc/.conda/envs/strainy/lib/python3.10/site-packages/pandas/core/generic.py", line 3967, in to_csv
    return DataFrameRenderer(formatter).to_csv(
  File "/home/jjpiconc/.conda/envs/strainy/lib/python3.10/site-packages/pandas/io/formats/format.py", line 1014, in to_csv
    csv_formatter.save()
  File "/home/jjpiconc/.conda/envs/strainy/lib/python3.10/site-packages/pandas/io/formats/csvs.py", line 251, in save
    with get_handle(
  File "/home/jjpiconc/.conda/envs/strainy/lib/python3.10/site-packages/pandas/io/common.py", line 749, in get_handle
    check_parent_directory(str(handle))
  File "/home/jjpiconc/.conda/envs/strainy/lib/python3.10/site-packages/pandas/io/common.py", line 616, in check_parent_directory
    raise OSError(rf"Cannot save file into a non-existent directory: '{parent}'")
OSError: Cannot save file into a non-existent directory: '/home/jjpiconc/COL-HUMAN-PROJECT/MT/MT_20240815/phasing/strainy_out/intermediate/adj_M'

Traceback (most recent call last):
  File "/home/jjpiconc/COL-HUMAN-PROJECT/MT/MT_20240815/phasing/strainy/strainy.py", line 32, in <module>
    main()
  File "/home/jjpiconc/COL-HUMAN-PROJECT/MT/MT_20240815/phasing/strainy/strainy.py", line 26, in main
    sys.exit(strainy.main.main())
  File "/home/jjpiconc/COL-HUMAN-PROJECT/MT/MT_20240815/phasing/strainy/strainy/main.py", line 127, in main
    sys.exit(phase_main(args))
  File "/home/jjpiconc/COL-HUMAN-PROJECT/MT/MT_20240815/phasing/strainy/strainy/phase.py", line 134, in phase_main
    consensus_dict = phase(StRainyArgs().edges_to_phase, args)
  File "/home/jjpiconc/COL-HUMAN-PROJECT/MT/MT_20240815/phasing/strainy/strainy/phase.py", line 57, in phase
    raise Exception("Error in worker thread, exiting")
Exception: Error in worker thread, exiting

Juan

juanjo255 commented 1 month ago

Update: Actually, maybe it is not a threads problem, when I used -t 1 It failed too. It was that it was not been able to create the folder intermediate/adj_M, so I did it manually and it worked, same happened latter with intermediate/graphs/linear_phase_NC_012920.1.png. Probably a permission issue (?)

katerinakazantseva commented 1 month ago

Hi Juan, can you please send command you use to run Strainy and log file ({output_dir}/log_phase/phase_root.log)?

Thank you, Katya