sourmash-bio / sourmash

Quickly search, compare, and analyze genomic and metagenomic data sets.
http://sourmash.readthedocs.io/en/latest/
Other
471 stars 80 forks source link

Add documentation to `sourmash signature describe` for source file origin #769

Open taylorreiter opened 4 years ago

taylorreiter commented 4 years ago

I calculated signatures like this:

sourmash compute -o {output} --merge {wildcards.sample}_mgx --scaled 2000 -k 21,31,51 --track-abundance {input.r1} {input.r2} 

and used sourmash signature describe to describe one signature and saw this output:

== This is sourmash version 2.2.0. ==
== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==

loaded 1 signatures total.
---
signature filename: MSM6J2M3_mgx.un
signature: MSM6J2M3_mgx
source file: inputs/mgx/MSM6J2M3_R2.fastq.gz
md5: 310e18dc950a60a84b280f7a18959af7
k=51 molecule=DNA num=0 scaled=2000 seed=42 track_abundance=1
size: 21618
signature license: CC0

The source file line made me nervous because it made me think that only one file was used to calculate the signature, but then I realized that the source file that is recorded is probably the last file that is given to compute. This is fine, but I think we should add it to the documentation.

dkoslicki commented 1 year ago

To chime in on this issue, I spent a while being misled by this too: I thought I was using the --merge flag incorrectly. Would be nice to know/be warned that the source_file will get funky when using --merged sketches