katerinakazantseva / strainy

Graph-based assembly phasing
Other
65 stars 5 forks source link

Output GFA missing unitigs in S lines #73

Open atabeerk opened 1 year ago

atabeerk commented 1 year ago

In the output GFA file (strainy_final.gfa), some unitigs in L lines do not have corresponding S lines. This may be due to attempting to remove the unitigs at some point (and removing their S lines) but forgetting to remove the L lines in which these unitigs are used.

The attached file is the output of the mock ONT dataset. Some unitigs that have that issue: edge_1291_139, edge_956_40, edge_874_33, edge_3054_s1_3041692, edge_3024_11380, edge_1553_1030193, edge_2864_1000769

jianshu93 commented 6 days ago

@atabeerk Hi, I have the same problem and the Bandage warning me that the format is not correct. See attached warning. How should I solve it?

Thanks, Jianshu

IMG_3343

atabeerk commented 6 days ago

Hi @jianshu93, thanks for reaching out. We will look into this.

Ataberk

jianshu93 commented 6 days ago

@atabeerk,

Thanks for the quick response. The code is well-written, I can run it without any problems and produce expected output. Just the format (feel like a small bug). Let me know if you want my data to reproduce the error.

best,

Jianshu

atabeerk commented 6 days ago

@jianshu93, if you can share

  1. input files,
  2. the command that you use to run strainy, and
  3. strain_contigs.gfa file that produces the error you mention

    that would be very helpful. Feel free to attach the files to this issue or send an email to ataberk@umd.edu if that is what you prefer.

Ataberk

jianshu93 commented 6 days ago

Hi @atabeerk, I shared with you the reads, metaFlye assembly graph and strainy graph output. I followed exact the same scripts as you suggested: flye --pacbio-hifi m84137_240709_192956_s1.hifi_reads.bc2076--bc2076.bam.fastq.gz -o metaflye -t 30 --meta --no-alt-contigs --keep-haplotypes -I 0 ./strainy.py --gfa_ref assembly_graph.gfa --fastq m84137_240709_192956_s1.hifi_reads.bc2076--bc2076.bam.fastq.gz --mode hifi -t 30 --output strainy_out

I've shared the input and output files with you via goole drive, let me know if you cannot access them.

Best, Jianshu

atabeerk commented 6 days ago

Hi @jianshu93, I got the files. I will keep you updated.

Best, Ataberk