hestiaAI / Argonodes

JSON and its Argonodes!
https://hestiaai.github.io/Argonodes/
Other
3 stars 1 forks source link

Paths containing data and sensitive data #99

Open emmanuel-hestia opened 2 years ago

emmanuel-hestia commented 2 years ago

The following can be related to issue https://github.com/hestiaAI/Argonodes/issues/40

In TikTok archives, I observe that some data is being used in the path itself. Specifically, when two users Alice and Bob start a conversation Alice's archive will contain a path named

$.Direct Messages.Chat History.ChatHistory.Chat History with Bob:

while Bob's archive will record the same conversation as

$.Direct Messages.Chat History.ChatHistory.Chat History with Alice:

In both cases, individual messages of the conversation have fields $.Direct Messages.Chat History.ChatHistory.Chat History with Alice:[*].From (for Bob) or $.Direct Messages.Chat History.ChatHistory.Chat History with Bob:[*].From (for Alice) where the person who emits the specific message is mentioned.

This causes several issues:

  1. data is inconsistent between Alice's and Bob's archive, since the same information is recorded under two different formats
  2. since the path hard-codes a value, the model created from Alice's archive will be unusable to parse the archive of a third party whose information has not yet been seen (which is an important part of the whole point). E.g. if Alice has only talked with Bob, applying the model generated from Alice's archive to Charlie's data will simply fail to detect a conversation with Daniel.
  3. since the values inserted are usernames, they constitute sensitive information that must never be published in the open.

Point 1 should not be of immediate practical concern, and Point 3 can be solved by careful manual curating of the data. Point 2., on the other hand, threatens our ability to process the affected sections of the data. Intuitively, this would call for

I hope that the current framework allows for such features and that they are not excessively difficult to implement.