Closed louisabraham closed 6 months ago
Are you specifically thinking of SEGUID v1 (--type=seguid
) here?
Yes. I thought that v1 does not validate input so I'm surprised.
I see. Good point; if we provide a SEGUID v1 implementation, we should align our behavior with that of the original implementation.
I thought about this one a little bit more. I think we could simply document that --type="seguid"
is a conservative implementation of SEGUID v1. We could also clarify that our SEGUID v1 does not turn input sequences into all upper case - that is something the user needs to do manually - and our current implementation will produce an error if they don't.
The reason for suggesting this approach is that:
Because of this, I think it's better to stay conservative and use the same validation as we use from SEGUID v2. I believe our "conservative" SEGUID v1 implementation is a superset of the original SEGUID v1. We will still be able to relax this in future releases, if we find it necessary.
DECISION: Our implementation SEGUID v1 should remain conservative. It's not a blocker, because anyone can always use --alphabet="..."
to work around the validation (modulo some reserved symbols).
Help for seguid()
for Python now says:
The original definition of the SEGUID v1 checksum algorithm (Babnigg & Giometti, 2006)
included transformation to uppercase before calculating the checksum.
Here, ``seguid()`` does *not* coerce the input sequence to upper case. If your input sequence
has lower-case symbols, you can use :meth:`str.upper` to achieve what the original method does.
``seguid()`` only accepts symbols as specified by the `alphabet` argument.
Thus, our implementation is more conservative, which has the benefit of
lowering the risk of passing the incorrect sequence by mistake.
I'll add the same to the R help.
I see two tests