signalmetrics / signal

The Laravel package for signal
MIT License
0 stars 0 forks source link

Update on creating data #7

Open iambateman opened 1 month ago

iambateman commented 1 month ago

@tnylea

Fields

// (this is the events) table

            // epochal milliseconds + random number generated on JS side. This is a unique signature on visit.
            $table->unsignedBigInteger('visit_signature')->primary();

            // e.g., "page_view", "button_click"
            $table->string('type')->index();

            // For authenticated users in the system.
            $table->unsignedBigInteger('custom_user_id')->nullable();

            // Page title from browser
            $table->string('title')->nullable();

            // User agent string
            $table->string('user_agent')->nullable();

            // IP address
            $table->string('ip')->nullable();

            // Hashed identifier for users
            $table->string('user_hash')->nullable();

            // Hashed identifier for pageviews
            $table->string('page_view_hash')->nullable();

            // Time on page (seconds)
            $table->integer('duration')->nullable();

            // Page referrer ("Google") || Note: null is considered direct
            $table->string('referrer')->nullable();

            // Host name ("example.com")
            $table->string('host_name')->nullable();

            // Path name ("/about")
            $table->string('path_name')->nullable();

            // Determine if the page view is a unique user.
            $table->boolean('is_unique')->nullable();

            // Determine if we have a crawler.
            $table->string('crawler')->nullable();

            // Size of the viewport and screen. e.g. x: 592, y: 1123
            $table->json('viewport')->nullable();
            $table->json('screen_resolution')->nullable();

            $table->string('connection_type')->nullable(); // e.g., "4G"
            $table->string('language')->nullable(); // Language
            $table->string('country_code')->nullable(); // Country code
            $table->string('city')->nullable(); // City
            $table->string('device_type')->nullable(); // Device type (nullable)
            $table->string('device_name')->nullable(); // e.g., "Macintosh"
            $table->string('platform')->nullable(); // e.g., "OS X"
            $table->string('browser')->nullable(); // e.g., "Chrome" or "Firefox"
            $table->integer('page_load_time_ms')->nullable(); // Page load time in milliseconds

            $table->string('utm_source')->nullable(); // "google" "facebook"
            $table->string('utm_medium')->nullable(); // "email" "cpc"
            $table->string('utm_campaign')->nullable(); // "black+friday"
            $table->string('utm_term')->nullable(); // often paid search term
            $table->string('utm_content')->nullable(); // differentiates two different terms
tnylea commented 1 month ago

@iambateman This all looks really good!

I did have a question about the viewport and screen_resolution:

// Size of the viewport and screen. e.g. x: 592, y: 1123
$table->json('viewport')->nullable();
$table->json('screen_resolution')->nullable();

Do we want these to be JSON data types?

Also, are we thinking that the analytics data is going to be stored in a single events table? I guess that would work. I'm sure there are advantages/disadvantages for this approach.

As an alternative. We could store them in a handful of smaller tables, like so:

  1. visits
  2. pages
  3. browsers
  4. devices
  5. platforms
  6. utm_tags
  7. locations
  8. events

Here's an example migration for each one:

visits - basic info about each session or visit to the site

Schema::create('visits', function (Blueprint $table) {
    // Unique signature for each visit (e.g., epochal milliseconds + random number)
    $table->unsignedBigInteger('visit_signature')->primary();

    // For authenticated users (can be null for guest users)
    $table->unsignedBigInteger('custom_user_id')->nullable();

    // Hashed identifier for users (for anonymous tracking)
    $table->string('user_hash')->nullable();

    // Hashed identifier for page views (specific to page visits)
    $table->string('page_view_hash')->nullable();

    // IP address of the visitor
    $table->string('ip')->nullable();

    // Duration of the visit (time spent on page, in seconds)
    $table->integer('duration')->nullable();

    // Determines if the visit is from a unique user
    $table->boolean('is_unique')->nullable();

    // Determine if the visit was from a crawler
    $table->string('crawler')->nullable();

    // Foreign key references to other entities
    $table->unsignedBigInteger('browser_id')->nullable();
    $table->unsignedBigInteger('device_id')->nullable();
    $table->unsignedBigInteger('platform_id')->nullable();
    $table->unsignedBigInteger('page_id')->nullable();
    $table->unsignedBigInteger('location_id')->nullable();

    // Timestamps
    $table->timestamps();
});

pages - info about the pages visited, including title, hostname, and path name

Schema::create('pages', function (Blueprint $table) {
    $table->id(); // Primary key for the page

    // Page title (optional, from browser)
    $table->string('title')->nullable();

    // Host name (e.g., "example.com")
    $table->string('host_name')->nullable();

    // Path name (e.g., "/about")
    $table->string('path_name')->nullable();

    // Referrer (e.g., "Google")
    $table->string('referrer')->nullable();

    // Page load time in milliseconds
    $table->integer('page_load_time_ms')->nullable();

    // Timestamps
    $table->timestamps();
});

browsers - info about the browser used during the visit

Schema::create('browsers', function (Blueprint $table) {
    $table->id(); // Primary key for the browser

    // Browser name (e.g., "Chrome", "Firefox")
    $table->string('browser')->nullable();

    // Timestamps
    $table->timestamps();
});

devices - info about the device type, name, resolution, and more

Schema::create('devices', function (Blueprint $table) {
    $table->id(); // Primary key for the device

    // Device type (e.g., "mobile", "desktop")
    $table->string('device_type')->nullable();

    // Device name (e.g., "Macintosh", "iPhone")
    $table->string('device_name')->nullable();

    // Viewport size (JSON format for width and height)
    $table->json('viewport')->nullable();

    // Screen resolution (JSON format for width and height)
    $table->json('screen_resolution')->nullable();

    // Connection type (e.g., "4G")
    $table->string('connection_type')->nullable();

    // Timestamps
    $table->timestamps();
});

platforms - info about the platform (operating system) of the visit

Schema::create('platforms', function (Blueprint $table) {
    $table->id(); // Primary key for the platform

    // Platform name (e.g., "OS X", "Windows")
    $table->string('platform')->nullable();

    // Timestamps
    $table->timestamps();
});

utm_tags - UTM campaign data related to the visit

Schema::create('utm_tags', function (Blueprint $table) {
    $table->id(); // Primary key for the UTM tag

    // UTM source (e.g., "google", "facebook")
    $table->string('utm_source')->nullable();

    // UTM medium (e.g., "email", "cpc")
    $table->string('utm_medium')->nullable();

    // UTM campaign (e.g., "black+friday")
    $table->string('utm_campaign')->nullable();

    // UTM term (often a paid search term)
    $table->string('utm_term')->nullable();

    // UTM content (differentiates between different ads or content)
    $table->string('utm_content')->nullable();

    // Foreign key to link UTM data to the visit
    $table->unsignedBigInteger('visit_signature');

    // Timestamps
    $table->timestamps();
});

locations - location info for each visit. We could easily exclude this info for a visit if the user has not connected with our GeoAPI

Schema::create('locations', function (Blueprint $table) {
    $table->id(); // Primary key for the location

    // Country code (e.g., "US")
    $table->string('country_code')->nullable();

    // City (e.g., "New York")
    $table->string('city')->nullable();

    // Language of the user
    $table->string('language')->nullable();

    // Foreign key to link the location to the visit
    $table->unsignedBigInteger('visit_signature');

    // Timestamps
    $table->timestamps();
});

events - track custom events like button clicks, form submissions, and more. Each event is associated with the visit_signature (from the visits table) and can have its own specific data

Schema::create('events', function (Blueprint $table) {
    $table->id(); // Primary key for the event

    // Foreign key to the visits table
    $table->unsignedBigInteger('visit_signature');

    // Type of event (e.g., "button_click", "form_submit")
    $table->string('type')->index();

    // Additional data about the event (e.g., button id, form name)
    $table->json('data')->nullable();

    // Timestamps for when the event was recorded
    $table->timestamps();

    // Foreign key relationship
    $table->foreign('visit_signature')->references('visit_signature')->on('visits')->onDelete('cascade');
});

I'm actually not sure which would be easier to implement though 🤣

I asked ChatGPT what would be the advantage/disadvantage of each. Here's what they gave me :)

Single Table:

Multiple Tables:

Let me know your thougts 😉

iambateman commented 1 month ago

Screen resolution and viewport

Schema

The short answer is I'm not sure. Maybe next week we plan to do a call to talk through schema?

There's a couple issues I'm trying to solve with the schema:

For the insert issue, I did a refactor that is interesting but complicated. It uses a "today" table and an "events" table which share a schema. I'll explain more when we chat.