Open insomniacslk opened 7 years ago
Base64 and hex maybe sound ok, but I really think we could still implement these with a regex?
Hex can be matched with ^[a-fA-F0-9]+$
(or ^([a-fA-F0-9]{2})+$
to match only even length strings for actual bytes).
I think ^([a-zA-Z0-9+/]{4})*([a-zA-Z0-9+/]{4}|[a-zA-Z0-9+/]{2}==|[a-zA-Z0-9+/]{3}=)$
is a valid regex for base64. It checks the alphabet, that the length is a multiple of four, and that the last block is valid.
Including encodings may be something of a slippery slope. Should we include URL encoding too?
These can totally be implemented with a regex, but I was suggesting at the same time an alternative method. Maybe in a separate commit? I think it's nice to have the option to use both, e.g. with a command line switch (--prefer-regex and --prefer-callable maybe?).
On the encodings - I'd say to start humble with these two, and add the others as soon as they're needed, in separate commits.
There's really no reason to use a callable though for these encodings. The regex is equivalent.
I'll take it as a no. Code reverted to only use regexes
Hey! Since I have almost no spare time at the moment I have added @bburky as a collaborator with push access to the repository.
any joy here?
Encodings are another interesting type of data to match. While these could be matched with a regex, a more robust approach is to attempt the actual decoding. This could be slower than regex'es in certain cases.
I've added a
callable
argument, and made the regex optional, but at least one is required.Examples:
$ ./hashid.py $(echo hashid | base64) Analyzing 'aGFzaGlkCg==' [+] Base64
$ ./hashid.py 414243 # the string 'ABC' as hex string Analyzing '414243' [+] CRC-24 [+] Hex string