tablelandnetwork / weeknotes

A place for weekly updates from the @tablelelandnetwork team
0 stars 0 forks source link

[NOT-172] Weeknotes individual update: December 18, 2023 #147

Closed dtbuchholz closed 9 months ago

dtbuchholz commented 10 months ago

Studio 101 workshop—plus, using OpenAI to annotate YouTube videos

by Dan Buchholz

A few months ago, we opened up early access to the Tableland Studio. It's a web app and CLI that lets you create & manage teams, projects, and tables for your Tableland deployments. We continue to add more features and wanted to walk through an end-to-end flow for how exactly you can use the Studio, such as basic team/project setup in the UI, deploying tables, and uploading data/tables via the CLI.

Also, we updated the Studio CLI docs to the latest (here), including new commands for creating projects, deployments (listing or creating), and importing data from CSV files. The workshop touches on all of these.

You can check out the video below for the full workshop. In it, we walk through how to:

https://www.youtube.com/watch?v=-MUq--Nrd0c

OpenAI Whisper & GPT-4

For YouTube videos, a nice UX feature is to have timestamps/chapters that link to specific video segments as well as subtitles for accessibility support. So, I went down the path of creating a simple python script that does exactly that! You can see the result in the video above.

With Whisper, the script generates a transcription of an mp4 video as a subtitles (.srt) file, which can be included during the video upload process. And with GPT-4, the script takes the outputs from Whisper and summarizes them with timestamps at certain points in the video.

You can check out the full source code here and use it with your own videos! Just note that chunking isn't implemented, so it expects the processed video-to-audio input for GPT-4 is 25MB or less (e.g., a 20 minute video should be within this range). Here's a quick simplification/snippet of what it does:

import whisper
from dotenv import find_dotenv, load_dotenv
from openai import OpenAI

# Load env vars for OpenAI API and organization
load_dotenv(find_dotenv())
openai_api_key = getenv("OPENAI_API_KEY")
openai_api_org = getenv("OPENAI_API_ORG")

# Load OpenAI API client
client = OpenAI(
    organization=openai_api_org,
    api_key=openai_api_key,
)

# Load whisper model
options = whisper.DecodingOptions(fp16=False, language="en")
whisper_model = whisper.load_model("base.en")

# Transcribe video
video_path = "/path/to/video.mp4"
result = whisper_model.transcribe(video_path, **options.__dict__, verbose=False)

# Logic take the result and format/generate .srt subtitles
segments = result["segments"]
srt_text = []
for i, segment in enumerate(segments):
    # SRT index starts from 1
    srt_index = i + 1
    srt_text.append(str(srt_index))

    # Formatting start and end times
    start_time = format_time(segment["start"])
    end_time = format_time(segment["end"])
    srt_text.append(f"{start_time} --> {end_time}")

    # Adding text
    srt_text.append(segment["text"].strip())

    # Add an empty line after each segment
    srt_text.append("")

segments = "\n".join(srt_text)
print(segments) # Show the subtitles

# Summarize segments for YouTube timestamps
response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {
            "role": "system",
            "content": """You are a helpful assistant...""", # Note: a more prescriptive prompt should be used
        },
        {
            "role": "user",
            "content": segments,
        },
    ],
    temperature=0.7,
)
summary = response.choices[0].message.content.strip()
print(summary) # Show YouTube timestamps summary

Using JETI for inserting/reading IPFS CIDs with the SDK

by Dan Buchholz

We recently updated the @tableland/jeti package (here) with some refactoring and conformance with the latest Tableland SDK APIs. You can check out the latest usage in our docs: here.

JETI is designed to add extensibility to the @tableland/sdk and transform writes/reads when querying a Database. There are a few examples provided in the package:

The IPFS processor is extremely useful since it helps let Tableland work with large media that can't be stored in a cell. To use it, you set up the processor, call it with string templating on writes to create/insert a CID, and resolve the result's CID from read queries. Here's a quick example of how it works:

import { pinToLocal, skip } from "@tableland/jeti";
import { Database } from "@tableland/sdk";

// Set up a `signer` and pass to the database
const db = new Database({ signer });
const tableName = "my_table_31337_2"; // Assuming the table was created beforehand

// Assuming you're running a local IPFS node, such as with the IPFS Desktop app
const localPinner = pinToLocal({
  host: "127.0.0.1",
  port: 5001,
  protocol: "http",
});

const contentToPin = "Hello world"; // A string, or a file buffer (Uint8Array)
// This will generate:
// INSERT INTO my_table_31337_2 (val) values ('bafybeiabfiu2uipule2sro2maoufk2waokktnsbqp5gvaaod3y44ouft54');
const sql = await localPinner`INSERT INTO ${skip(
  tableName
)} (val) VALUES ('${contentToPin}');`;

const { meta: insert } = await db.prepare(sql).all();
await insert.txn?.wait();

// Process content from CID in table read
const { results } = await db.prepare(`SELECT * FROM ${tableName}`).all(); // Results contain CID
const resultsWithCIDsResolved = await localPinner.resolve(results, ["val"]);
console.log(resultsWithCIDsResolved);
// [
//   {
//     id: 1,
//     val: 'Hello world'
//   }
// ]

This is not only useful for large files, but recall there are type and feature constraints with the Tableland SQL language where non-deterministic behavior is blocked. The only types supported are TEXT, BLOB, or INT/INTEGER.

For example, if you want to write floating points, you could simply write the data as TEXT or BLOB. Or, maybe you want to write something like a large text embedding, which is a multi-dimensional vector of floating points. With JETI, you could easily pin that content to IPFS and process the values on the client-side and use it in ML/AI use cases!

From SyncLinear.com | NOT-172