rudderlabs / community-user-transformations

MIT License
1 stars 1 forks source link

Topic Extraction #12

Open amiryselim opened 1 year ago

amiryselim commented 1 year ago

Contact Details

amir@tradeblock.us

Language

Python

Category

Data Processing and Enrichment

Description

For example, applying this to a real product feedback submission adds this keyword breakdown to your event:

"keywords": {
  "oneWord": {
    "board": 11,
    "folder": 13,
    "manager": 11,
    "permission": 14,
    "permissions": 11
  },
  "twoWords": {
    "board manager": 11,
    "folder permission": 3,
    "manager level": 5,
    "permissions settings": 4,
    "specific folder": 4
  }
}

Code Block

def transformEvent(event, metadata):
    message = event["properties"]["message"]

    punctuation = ['"', "'", "!", "?", ".", "-", ":", ","]
    for mark in punctuation:
        message = message.replace(mark, "")

    stop_words = ["i", "me", "my", "myself", "we", "our", "ours", "ourselves", "you", "your", "yours", "yourself", "yourselves", "he", "him", "his", "himself", "she", "her", "hers", "herself", "it", "its", "itself", "they", "them", "their", "theirs", "themselves", "what", "which", "who", "whom", "this", "that", "these", "those", "am", "is", "are", "was", "were", "be", "been", "being", "have", "has", "had", "having", "do", "does", "did", "doing", "a", "an", "the", "and", "but", "if", "or", "because", "as", "until", "while", "of", "at", "by", "for", "with", "about", "against", "between", "into", "through", "during", "before", "after", "above", "below", "to", "from", "up", "down", "in", "out", "on", "off", "over", "under", "again", "further", "then", "once", "here", "there", "when", "where", "why", "how", "all", "any", "both", "each", "few", "more", "most", "other", "some", "such", "no", "nor", "not", "only", "own", "same", "so", "than", "too", "very", "s", "t", "can", "will", "just", "don", "should", "now"]

    lower_message = message.lower()
    words = lower_message.split()

    unigram_counts = {word: lower_message.count((word)) for word in set(words) if word not in stop_words}
    top_unigrams = dict(sorted(unigram_counts.items(), key=lambda item: item[1], reverse=True)[:5])

    bigrams = set([' '.join(words[x:x+2]) for x in range(len(words) - 1) if all([word not in stop_words for word in words[x:x+2]])])
    bigram_counts = {bigram: lower_message.count((bigram)) for bigram in bigrams}
    top_bigrams = dict(sorted(bigram_counts.items(), key=lambda item: item[1], reverse=True)[:5])

    event["properties"]["keywords"] = {"oneWord": top_unigrams, "twoWords": top_bigrams}

    return event

Input Payload for testing

[
  {
    "anonymousId": "8d872292709c6fbe",
    "channel": "mobile",
    "context": {
      "app": {
        "build": "1",
        "name": "AMTestProject",
        "namespace": "com.rudderstack.android.rudderstack.sampleAndroidApp",
        "version": "1.0"
      },
      "device": {
        "id": "8d872292709c6fbe",
        "manufacturer": "Google",
        "model": "AOSPonIAEmulator",
        "name": "generic_x86_arm",
        "type": "android"
      },
      "library": {
        "name": "com.rudderstack.android.sdk.core",
        "version": "1.0.2"
      },
      "locale": "en-US",
      "network": {
        "carrier": "Android",
        "bluetooth": false,
        "cellular": true,
        "wifi": true
      },
      "os": {
        "name": "Android",
        "version": "9"
      },
      "screen": {
        "density": 420,
        "height": 1794,
        "width": 1080
      },
      "timezone": "Asia/Kolkata",
      "traits": {
        "address": {
          "city": "Kolkata",
          "country": "India",
          "postalcode": "700096",
          "state": "West bengal",
          "street": "Park Street"
        },
        "age": "30",
        "anonymousId": "8d872292709c6fbe",
        "birthday": "2020-05-26",
        "createdat": "18th March 2020",
        "description": "Premium User for 3 years",
        "email": "identify@test.com",
        "firstname": "John",
        "userId": "sample_user_id",
        "lastname": "Sparrow",
        "name": "John Sparrow",
        "id": "sample_user_id",
        "phone": "9112340345",
        "username": "john_sparrow"
      },
      "userAgent": "Dalvik/2.1.0 (Linux; U; Android 9; AOSP on IA Emulator Build/PSR1.180720.117)"
    },
    "event": "Feedback Submitted",
    "integrations": {
      "All": true
    },
    "messageId": "1590431830915-73bed370-5889-436d-9a9e-0c0e0c809d06",
    "properties": {
      "message" : "Once the 'Member Permissions' and/or 'Role Permissions' from the 'Board Manager' level is overrided from a Folder level, there's no way to revert 'Board Manager' to take control over (override) the Folder level modified permissions. Now, it is extremely important to be able to override permissions from the general 'Board Manager' level based on specific Folder permissions settings, but it is also extremely important to be able to override the specific Folder permission settings again by the 'Board Manager' permissions settings. This makes maintainance more difficult since it would require to adjust permissions Folder by Folder just a for quick update on those before returning them back as it was before, instead of simply do it from the 'Board Manager' level do all the required updates in all of the Folders and revert the permissions back as it was before from one single place (Board Manager). This could maybe be a button placed somewhere in the specific Folder settings with an initial disabled state when it is not overriding the Board Manager permissions settings, and automatically enable the state of the button once this Folder overrides the Board Manager settings, so this button text could say something like 'Clear All Specific Folder Permissions', causing that the Board Manager permissions settings take control over that Folder again. Additional to that, please add an option to override this in a more silent/transparent way, just by being able to simply turn off a permission from the Folder or Board Manager level, and once the permission is turned on again, the last one (whether is from the Folder or Board Manager level) would be the one that overrides the other."
    },
    "originalTimestamp": "2020-05-25T18:37:10.917Z",
    "type": "track",
    "userId": "sample_user_id"
  }
]

License

gitcommitshow commented 1 year ago

Interesting use case. I can imagine that it can be useful in case of customer support teams as well. If performance were not an issue, this could be further enhanced by creating embeddings and then using them to categorise the feedback/question, eventually helping in routing the feedback to the right team in real-time.