marschall-lab / strand-seq-graph-phasing

MIT License
8 stars 1 forks source link

Issue generating phased assemblies with verkko 2.2 #55

Open koisland opened 3 days ago

koisland commented 3 days ago

Hi @mir-cat,

I'm using graphasing and have generated a rukki paths file for my sample.

name    path    assignment
haplotype2_from_utig4-2136  <utig4-6535<utig4-3934<utig4-2135>utig4-2134[N5000N:ambig_path]>utig4-2136  HAPLOTYPE2
haplotype2_from_utig4-0 >utig4-0[N5000N:ambig_path]<utig4-3044[N6614N:ambig_bubble]<utig4-6046<utig4-3168>utig4-3167    HAPLOTYPE2
haplotype1_from_utig4-7545  <utig4-7544[N19112N:tangle]>utig4-7545<utig4-8179   HAPLOTYPE1

However, when I provided this gaf via --paths to Verkko v2.2, it ignored it in a dry run and a real run. In both cases, Verkko correctly copies the graphasing gaf but later overwrites it.

wd/verkko_phased2/                  
├── 5-untip                         
│   ├── combined-edges-final.gfa                                                   
│   ├── combined-nodemap-final.txt                                                                                                                                     
│   ├── nodelens-final.txt                                                                                                                                             
│   ├── unitig-unrolled-unitig-unrolled-popped-unitig-normal-connected-tip.gfa                                                                                         
│   ├── unitig-unrolled-unitig-unrolled-popped-unitig-normal-connected-tip.hifi-coverage.csv
│   └── unitig-unrolled-unitig-unrolled-popped-unitig-normal-connected-tip.ont-coverage.csv                                                                            
├── 6-layoutContigs                                                                
│   ├── combined-edges.gfa  
│   ├── combined-nodemap.txt                                                                                                                                           
│   ├── consensus_paths.txt
│   ├── hifi.alignments.gaf
│   ├── nodelens.txt       
│   └── ont.alignments.gaf               
├── 7-consensus                                                                                                                                                        
│   ├── ont_subset.fasta.gz                                                                                                                                            
│   └── ont_subset.id                                                              
├── assembly.homopolymer-compressed.noseq.gfa                                      
├── emptyfile               
├── PAB16_verkko_phasing.log             
├── snakemake.sh                
└── verkko.yml   

In the logs.

[Tue Nov 19 11:11:30 2024]                                                                                                                                             
rule generateLayoutContigsInputs:                                                                                                                                      
    input: 5-untip/unitig-unrolled-unitig-unrolled-popped-unitig-normal-connected-tip.gfa, 1-buildGraph/paths.gaf, 4-processONT/alns-ont-mapqfilter.gaf, 4-processONT/gaps-ont.gaf                                                                                                                                                            
    output: 6-layoutContigs/combined-nodemap.txt, 6-layoutContigs/combined-edges.gfa, 6-layoutContigs/hifi.alignments.gaf, 6-layoutContigs/ont.alignments.gaf, 6-layoutContigs/consensus_paths.txt, 6-layoutContigs/nodelens.txt                          
    log: 6-layoutContigs/createLayoutInputs.err
    jobid: 14
    reason: Input files updated by another job: 4-processONT/gaps-ont.gaf, 5-untip/unitig-unrolled-unitig-unrolled-popped-unitig-normal-connected-tip.gfa, 4-processONT
/alns-ont-mapqfilter.gaf, 1-buildGraph/paths.gaf
    resources: tmpdir=/tmp, job_id=1, n_cpus=1, mem_gb=32, time_h=24

This is the command for the unphased assembly.

verkko \
--snakeopts "--cores 32" \
--hifi data/PacBio_HiFi/*.fastq.gz \
--nano data/nanopore/PAB16_combined.fastq.gz \
-d wd/verkko 2> wd/verkko/PAB16_verkko.log

This is the command I ran to produce the phased assembly.

verkko \
--snakeopts "--cores 32" \
--hifi data/PacBio_HiFi/*.fastq.gz \
--nano data/nanopore/PAB16_combined.fastq.gz \
--assembly wd/verkko \
--paths strand-seq-graph-phasing/rukki/PAB16-Verkko/PAB16-Verkko_rukki_paths.gaf \
-d wd/verkko_phased 2> wd/verkko_phased/PAB16_verkko_phasing.log

I know this is less of an issue with graphasing, but I was curious if you have encountered this issue before with Verkko and could help.

Thanks, Keith

mir-cat commented 2 days ago

Hi Keith,

Thanks for reaching out! Unfortunately I do not have experience with this issue with regards to Verkko v2.2. I can reach out to the Verkko developers to see if they have any advice regarding this issue. I will let you know what they say.

koisland commented 2 days ago

Thanks! Any help would be appreciated.