rotary-genomics / rotary

Assembly/annotation workflow for Nanopore-based microbial genome data containing circular DNA elements
BSD 3-Clause "New" or "Revised" License
2 stars 1 forks source link

Aggregate QC Stats Across Samples #187

Open LeeBergstrand opened 3 weeks ago

LeeBergstrand commented 3 weeks ago

Add code to aggregate QC stats for the entire run.

LeeBergstrand commented 3 weeks ago

@jmtsuji Fully tested and ready for your review.

The output looks like this:

Sample Stage Total Sequences Total Bases (Mbp) Sequence Length Min Sequence Length Max avg_sequence_length median_sequence_length %GC
blam Before QC 24000.0 200.3 121.0 124032.0 8871.063208333330 2499.0 60.0
blam After QC 827.0 7.7 754.0 96881.0 9365.918984280530 4999.0 59.0
blam Change (%) -96.55416666666670 -96.1557663504743 523.1404958677690 -21.89031862745100 5.578314169628950 100.04001600640300 -1.6666666666666700
cram Before QC 24000.0 200.3 121.0 124032.0 8871.063208333330 2499.0 60.0
cram After QC 827.0 7.7 754.0 96881.0 9365.918984280530 4999.0 59.0
cram Change (%) -96.55416666666670 -96.1557663504743 523.1404958677690 -21.89031862745100 5.578314169628950 100.04001600640300 -1.6666666666666700
jmtsuji commented 2 weeks ago

@LeeBergstrand Very nice addition to generate a tabular summary of QC changes for all samples. It will take me another couple days to review this new code, but I'll get back to you soon on this!