Now that we've added #47 and attribute based identification to sig_identify_collection(), should we drop the digest-package dependent hash based identification?
Pros:
Drops a dependency (digest)
Reduces the number of tests required (while it is an option we need to test sig_identify_collection() in hash-only mode to make sure the reference db is always up to date).
Reduces code complexity.
Cons:
If a signature collection is stored as a csv file and read into R using read.csv. The hash based method is theoretically capable of identifying when it matches a sigverse collection without any special attributes being added. Its unclear at this stage how important this feature will be (importance could be mitigated by adding a parse_signature_collection() function that automatically adds the relevant attributes e.g. collection_name = filename ).
The digest package depends only on utils - so its a VERY light package and not a major problem for dependency identification
More robust to identify collections based on content rather than metadata (at least now that locale-dependence of sorting pre-md5 computation has been removed)
Now that we've added #47 and attribute based identification to
sig_identify_collection()
, should we drop the digest-package dependent hash based identification?Pros:
sig_identify_collection()
in hash-only mode to make sure the reference db is always up to date).Cons:
parse_signature_collection()
function that automatically adds the relevant attributes e.g. collection_name = filename ).