undergroundwires / privacy.sexy

Open-source tool to enforce privacy & security best-practices on Windows, macOS and Linux, because privacy is sexy
https://privacy.sexy
GNU Affero General Public License v3.0
3.76k stars 163 forks source link

Introduce IDs for each script/category #262

Open undergroundwires opened 9 months ago

undergroundwires commented 9 months ago

TL;DR: Please checked the Proposed solution, I'm looking for community feedback on the proposed ID format and the overall approach.

Problem description

The absence of IDs for scripts and categories is blocking:

To be able to implement these, we need to assign IDs for each script and category.

Requirements for IDs

The IDs should be:

Proposed solution

Adopt the approach of generating a GUID (e.g., 27e7b119-6fdb-447f-91e1-99ecf94d9f34) and extracting the first segment (prior to the first dash, e.g., 27e7b119).

So every script and category will have an ID in format of 27e7b119.

Alternatives considered

TODO

neube3 commented 9 months ago

LGTM!

One potential caveat to point out, though would be the fact, that using only the first word increases chances of collisions down the line - though I don't know enough to say how significant a chance would it be. Though I don't have a good idea how to prevent this (a bad idea would be to update the CLI tool with the list of currently used IDs and make the tool re-generate a new GUID if the first word of a fresh one collided with an old one).

neube3 commented 5 months ago

Just snowballing here - since the hierarchical structure is bad (categories are a bit whimsical, can change names, etc.) and GUIDs are a proposed solution - what if we could have a list of GUIDs but in the script itself give each feature an array of category tags? This way you're not limited to only one category. I hope that's potentially useful ;).

undergroundwires commented 4 months ago

Can you elaborate @neube3?

Do I understand correctly that you're suggesting that we keep categories but introduce one more level of taxonomy through tags? You're right that the hierarchical categories can never be perfect and one script could often be categorized inside a few groups. So we add tag support and tag scripts/categories as another level of categorization? And assign IDs to these tags too?

neube3 commented 4 months ago

Can you elaborate @neube3?

Do I understand correctly that you're suggesting that we keep categories but introduce one more level of taxonomy through tags? You're right that the hierarchical categories can never be perfect and one script could often be categorized inside a few groups. So we add tag support and tag scripts/categories as another level of categorization? And assign IDs to these tags too?

I knew I was right when I put it up for discussion - I didn’t even think about IDs for tags, but that’s exactly the type of an idea enhancement I expected from a discussion :)! Tag ID's sound great - I’m almost always for separating the key and label (doing multilingual work - you learn to appreciate ID-label coupling). As a side note - I have no clue about any plans for translating the scripts, but tags could have language versions to them (as a property, I guess?) but since the ID would stay the same through label and language changes the whole system should be both accessible and easily shareable.

As for the tags, we could either:

  1. Get rid of the categories altogether and use tags only (it might be slightly confusing, but hopefully only in the transitory period from the legacy thinking mode - tags are just a superior superset for hierarchy; also: see more below). Don’t really use the hierarchy for anything but display, use tags internally.
  2. Rename tags to categories and drop the hierarchical aspect (might be confusing, but makes the id-ing aspect somewhat easier since there would be one less thing to worry about)
  3. Keep categories as legacy, including/not including their hierarchy and add tags separately. Don’t really use the hierarchy for anything but display, use tags internally.

Whichever we choose, we should allow users to search by tag; there are also fun little additional things you can do with a tag structure you cannot do in a hierarchy, most basic of which is one script with multiple tags ("Is it script a security or a speedup measure?" - now you don’t have to guess!), a further one is a tag cloud (pleasing visual representation of tags by their relative count) and finally the best thing ever: searching by tags, e.g. "All speedup scripts in one place".

Last, but not least, tags can include/reference other tags (non-hierarchically, but we could always choose to include only such subsets).

Kerobyte commented 2 weeks ago

human readable ids are nice ribbit https://dev.to/stripe/designing-apis-for-humans-object-ids-3o5a

undergroundwires commented 2 weeks ago

FYI, ID support is implemented. I'm merging necessary refactorings as part of patches and will add the main code in next release. In previous refactoring c138f74, I added a concept called Executable.

An Executable represents a Category or a Script that can be uniquely identified (a.k.a. id) within a collection such as windows/linux/macos (a.k.a. collection). So I was initially thinking designing executables this way:

{
  "key": {
    "collection": "windows",
    "id": "27e7b119",
  },
}

However, the blog post you linked @Kerobyte inspired me to do adding type: "script" and type: "category" in key field:

{
  "key": {
    "collection": "windows",
    "type": "script",
    "id": "27e7b119",
  },
}

I quote this:

Querying every single table to find one ID is extremely inefficient, so we need a better method. One way could be to require an additional “type” parameter.

With the previous design I had in mind, all executable in a collection must be iterated until the executable is found to be able to find out the object and read the type. But having type inside key, type can act as a secondary partition key and remove the need to iterate the other ones. The only issue with this approach is the type (category/script) must be known when doing queries. This is not an issue for the API (#262), import export (#126) and the type can be included in URLs for permalink support (#49).

The article argues that having an object like this "complicates the API with no additional gain". I do not agree, the gain is that it's much more explicit and human readable. So I will keep this kind of complex key/ID structure in the code.

However, we'll need to serialize this object to use in public API (#126) and to store selections (#59). And in that case, I guess we can use this format:

{collection}/{type_plural}/{id}

For example: windows/category/27e7b119, macos/scripts/9d8b6dd9.

I guess this URI format is much more clear than what the articles suggests with underscores ([prefix]_[random_string]). The URI format explicitly tells which part of the ID is more generic than the other part in REST-like manner => collection > executable type > executable ID.

We're making the last decisions. I'd like to hear your feedback on this.

(@Marc05 feel free to join designing this if you're around)


TLDR

Use a composite/aggregate key for executables (category/script) composed of following properties:

{
  "key": {
    "collection": "windows",
    "type": "script",
    "id": "27e7b119",
  },
}

Serialize it in URI format like this: windows/scripts/27e7b119.

New decision: Add type (i.e. "category" or "script") as part of the composite key.

neube3 commented 2 weeks ago

Not gonna lie - I didn't grok everything, but your solution LGTM!

Also, the article basically mentions Hungarian notation. Which is not bad, per se, but not new, either.

Semi-connected rant below, feel free to skip if you're for the opinion on the solution: And the author thinks that having a userID as a number is bad, because if you have huge holes in your API, then some malicious actors could use their ID to guess others' IDs and then do something nefarious. Well... have they tried not having a huge hole in their APIs? "Need to know" and "Chinese walls" are basic concepts of security and if someone posits such a strawman as an intro point in favour of their theory/solution then I automatically grow doubly sceptical. If you have such holes in the API, then numbered userIDs are the least of your problems.

As for the "oh, I'm customer 50, your operation isn't big and therefore doesn't feel that great anymore" argument - it would be trivial to just add a random(10000,100000) to the ID and check for collisions - another strawman.

Marc05 commented 2 weeks ago

Ship it! :)

It probably goes without saying to keep in mind some safety checks around categories; it sounds like there's potential for infinite loops since they can reference each other.

Good progress!

undergroundwires commented 2 weeks ago

@neube3 I agree. It's security by obscurity, and a very weak one. Enumeration is mitigated by proper authentication and network controls not by adding prefixes to IDs.

Thanks for the input everyone.


This will be final design:

image

The support in application will be added in upcoming patch release(s). The collection files will modified in the feature release.

GPT 4 agrees that the executable type (whether it is a script or a category) should be part of the key/ID when my prompt asked for it to optimize for easiest future changes to maximize maintainability Incorporating `executableType` into the composite ID for privacy.sexy could provide benefits, especially considering the goal of maximizing maintainability and allowing for easy future updates and categorization. Pros of Including `executableType` in ID: - **Clear Distinction:** Differentiates between types of executables (e.g., scripts vs categories) directly in the ID. This clarity can streamline processing logic and data retrieval by quickly identifying the type without parsing additional data structures. - **Scalability:** Facilitates adding new executable types in the future without reworking the ID structure or parsing mechanisms. - **Simplified Data Relations:** Enhances the ability to manage relationships and dependencies between different types of data. For instance, in a scenario where scripts and categories might have different attributes or behaviors, IDs with type information simplify filtering and operations specific to types. - **Enhanced Search and Organization:** Having type information within the ID can make indexing and searching through a large number of items more efficient. Cons of Including `executableType` in ID: - **Increased Complexity:** The IDs will be longer and potentially more cumbersome to handle in code, especially for manual debugging or during development. - **Potential Redundancy:** If the type is rarely needed for practical purposes or if systems already manage type information efficiently through other means, this addition could be superfluous. - **Modification Overhead:** Any change in the categorization or type naming would require updates to the IDs, possibly affecting historical data or requiring migration steps. Recommendation: Given the focus on maintainability and adaptability for future changes, including `executableType` in the composite ID appears beneficial. It provides a more robust and flexible structure for managing different entities within the system and adapts more readily to future expansions or modifications in data organization. This approach is especially valuable in environments where different types of executables might have distinct handling, permissions, or processing paths.
ltguillaume commented 2 weeks ago

If I might suggest another piece of metadata for each executable: the privacy.sexy version/build the specific script got introduced in, or a "revision number".

In the future, people will want to load the script they previously created as a template in order to create an updated version of it. It would then be important to be able to quickly see all the newly added executables since the template script was created and add the desired ones from such a filtered overview.

It would also make sense to change this version metadata for executables if new breaking changes have been made or if new incompatibility information or other warnings were added, so that people can reconsider these executables upon rebuilding their script.