Open undergroundwires opened 9 months ago
LGTM!
One potential caveat to point out, though would be the fact, that using only the first word increases chances of collisions down the line - though I don't know enough to say how significant a chance would it be. Though I don't have a good idea how to prevent this (a bad idea would be to update the CLI tool with the list of currently used IDs and make the tool re-generate a new GUID if the first word of a fresh one collided with an old one).
Just snowballing here - since the hierarchical structure is bad (categories are a bit whimsical, can change names, etc.) and GUIDs are a proposed solution - what if we could have a list of GUIDs but in the script itself give each feature an array of category tags? This way you're not limited to only one category. I hope that's potentially useful ;).
Can you elaborate @neube3?
Do I understand correctly that you're suggesting that we keep categories but introduce one more level of taxonomy through tags? You're right that the hierarchical categories can never be perfect and one script could often be categorized inside a few groups. So we add tag support and tag scripts/categories as another level of categorization? And assign IDs to these tags too?
Can you elaborate @neube3?
Do I understand correctly that you're suggesting that we keep categories but introduce one more level of taxonomy through tags? You're right that the hierarchical categories can never be perfect and one script could often be categorized inside a few groups. So we add tag support and tag scripts/categories as another level of categorization? And assign IDs to these tags too?
I knew I was right when I put it up for discussion - I didn’t even think about IDs for tags, but that’s exactly the type of an idea enhancement I expected from a discussion :)! Tag ID's sound great - I’m almost always for separating the key and label (doing multilingual work - you learn to appreciate ID-label coupling). As a side note - I have no clue about any plans for translating the scripts, but tags could have language versions to them (as a property, I guess?) but since the ID would stay the same through label and language changes the whole system should be both accessible and easily shareable.
As for the tags, we could either:
Whichever we choose, we should allow users to search by tag; there are also fun little additional things you can do with a tag structure you cannot do in a hierarchy, most basic of which is one script with multiple tags ("Is it script a security or a speedup measure?" - now you don’t have to guess!), a further one is a tag cloud (pleasing visual representation of tags by their relative count) and finally the best thing ever: searching by tags, e.g. "All speedup scripts in one place
".
Last, but not least, tags can include/reference other tags (non-hierarchically, but we could always choose to include only such subsets).
human readable ids are nice ribbit https://dev.to/stripe/designing-apis-for-humans-object-ids-3o5a
FYI, ID support is implemented. I'm merging necessary refactorings as part of patches and will add the main code in next release. In previous refactoring c138f74, I added a concept called Executable.
An Executable represents a Category or a Script that can be uniquely identified (a.k.a. id
) within a collection such as windows/linux/macos (a.k.a. collection
). So I was initially thinking designing executables this way:
{
"key": {
"collection": "windows",
"id": "27e7b119",
},
}
However, the blog post you linked @Kerobyte inspired me to do adding type: "script"
and type: "category"
in key field:
{
"key": {
"collection": "windows",
"type": "script",
"id": "27e7b119",
},
}
I quote this:
Querying every single table to find one ID is extremely inefficient, so we need a better method. One way could be to require an additional “type” parameter.
With the previous design I had in mind, all executable in a collection must be iterated until the executable is found to be able to find out the object and read the type. But having type
inside key
, type
can act as a secondary partition key and remove the need to iterate the other ones. The only issue with this approach is the type (category/script) must be known when doing queries. This is not an issue for the API (#262), import export (#126) and the type can be included in URLs for permalink support (#49).
The article argues that having an object like this "complicates the API with no additional gain". I do not agree, the gain is that it's much more explicit and human readable. So I will keep this kind of complex key/ID structure in the code.
However, we'll need to serialize this object to use in public API (#126) and to store selections (#59). And in that case, I guess we can use this format:
{collection}/{type_plural}/{id}
For example: windows/category/27e7b119
, macos/scripts/9d8b6dd9
.
I guess this URI format is much more clear than what the articles suggests with underscores ([prefix]_[random_string]
). The URI format explicitly tells which part of the ID is more generic than the other part in REST-like manner => collection > executable type > executable ID.
We're making the last decisions. I'd like to hear your feedback on this.
(@Marc05 feel free to join designing this if you're around)
TLDR
Use a composite/aggregate key for executables (category/script) composed of following properties:
{
"key": {
"collection": "windows",
"type": "script",
"id": "27e7b119",
},
}
Serialize it in URI format like this: windows/scripts/27e7b119
.
New decision: Add type
(i.e. "category" or "script") as part of the composite key.
Not gonna lie - I didn't grok everything, but your solution LGTM!
Also, the article basically mentions Hungarian notation. Which is not bad, per se, but not new, either.
Semi-connected rant below, feel free to skip if you're for the opinion on the solution: And the author thinks that having a userID as a number is bad, because if you have huge holes in your API, then some malicious actors could use their ID to guess others' IDs and then do something nefarious. Well... have they tried not having a huge hole in their APIs? "Need to know" and "Chinese walls" are basic concepts of security and if someone posits such a strawman as an intro point in favour of their theory/solution then I automatically grow doubly sceptical. If you have such holes in the API, then numbered userIDs are the least of your problems.
As for the "oh, I'm customer 50, your operation isn't big and therefore doesn't feel that great anymore" argument - it would be trivial to just add a random(10000,100000) to the ID and check for collisions - another strawman.
Ship it! :)
It probably goes without saying to keep in mind some safety checks around categories; it sounds like there's potential for infinite loops since they can reference each other.
Good progress!
@neube3 I agree. It's security by obscurity, and a very weak one. Enumeration is mitigated by proper authentication and network controls not by adding prefixes to IDs.
Thanks for the input everyone.
This will be final design:
The support in application will be added in upcoming patch release(s). The collection files will modified in the feature release.
If I might suggest another piece of metadata for each executable: the privacy.sexy version/build the specific script got introduced in, or a "revision number".
In the future, people will want to load the script they previously created as a template in order to create an updated version of it. It would then be important to be able to quickly see all the newly added executables since the template script was created and add the desired ones from such a filtered overview.
It would also make sense to change this version metadata for executables if new breaking changes have been made or if new incompatibility information or other warnings were added, so that people can reconsider these executables upon rebuilding their script.
TL;DR: Please checked the Proposed solution, I'm looking for community feedback on the proposed ID format and the overall approach.
Problem description
The absence of IDs for scripts and categories is blocking:
To be able to implement these, we need to assign IDs for each script and category.
Requirements for IDs
The IDs should be:
Proposed solution
Adopt the approach of generating a GUID (e.g.,
27e7b119-6fdb-447f-91e1-99ecf94d9f34
) and extracting the first segment (prior to the first dash, e.g.,27e7b119
).So every script and category will have an ID in format of
27e7b119
.Alternatives considered
TODO