vgl-hub / gfastats

A single fast and exhaustive tool for summary statistics and simultaneous *fa* (fasta, fastq, gfa [.gz]) genome assembly file manipulation.
MIT License
91 stars 8 forks source link

Null results for statistics parameters #39

Closed CEPHAS-01 closed 1 year ago

CEPHAS-01 commented 1 year ago

Hi,

I just ran gfastats on a gfa file produced by hifiasm in trio mode to obtain the summary assembly statistics on scaffold length, N and L statistics etc. The command is as follows:

gfastats -f $inFile -t --stats > $outFile

The output I got is 0 for most of the parameters reported.

scaffolds 0

Total scaffold length 0 Average scaffold length nan Scaffold N50 0 Scaffold auN 0.00 Scaffold L50 0 Largest scaffold 0 Smallest scaffold 0

contigs 0

Total contig length 0 Average contig length nan Contig N50 0 Contig auN 0.00 Contig L50 0 Largest contig 0 Smallest contig 0

gaps in scaffolds 0

Total gap length in scaffolds 0 Average gap length in scaffolds 0.00 Gap N50 in scaffolds 0 Gap auN in scaffolds 0.00 Gap L50 in scaffolds 0 Largest gap in scaffolds 0 Smallest gap in scaffolds 0 Base composition (A:C:G:T) 0:0:0:0 GC content % nan

soft-masked bases 0

segments 21136

Total segment length 6154411558 Average segment length 291181.47

gaps 0

paths 0

edges 59590

Average degree 2.82

connected components 126

Largest connected component length 1596008570

dead ends 716

disconnected components 167

Total length disconnected components 115218292

separated components 293

bubbles 1052

circular segments 2

I am using the latest release (v1.3.6) of gfastats, extracted and compiled from the "gfastats.v1.3.6.tar.gz " file.

Am I running the command correctly?

gf777 commented 1 year ago

Hi @CEPHAS-01,

Thanks for reaching out. Sorry I realize that this is a bit confusing. Conceptually, we do not consider a 'contig' as defined in the GFA unless there is an actual path (potentially involving multiple segments) that define its sequence. This is an attempt to distinguish contigs from segments.

In practice, just add the option --discover-paths and it will generate a path for each segment, thus generating contigs that can then be evaluated in the stats.

I'll add a comment in the readme as well.

CEPHAS-01 commented 1 year ago

Hi Giulio, Thanks for the prompt response. It sure works well now with the --discover-paths option. Would be nice to have that as a comment in the README as you suggested, so that other users would be rightly guided.

gf777 commented 1 year ago

done, thank you for the input!