mediar-ai / screenpipe

rewind.ai x cursor.com = your AI assistant that has all the context. 24/7 screen & voice recording for the age of super intelligence. get your data ready or be left behind
https://screenpi.pe
MIT License
10.15k stars 602 forks source link

Add to database endpoint #593

Closed cparish312 closed 3 weeks ago

cparish312 commented 4 weeks ago

name: Add /add endpoint to database about: Creates an endpoint to add frames, ocr_results, and transcription results to the screenpipe database from outside sources


description

Creates an endpoint to add frames, ocr_results, and transcription results to the screenpipe database from outside sources

related issue: # /claim #467

type of change

checklist

additional notes

any other relevant information about the pr.

vercel[bot] commented 4 weeks ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
screenpipe ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 29, 2024 11:52pm
cparish312 commented 4 weeks ago

@louis030195 Not sure how you want to add tests for the frame writing to mp4 bit.

Also, currently you can't specify the OcrEngine used to generate the ocr_results and it just inputs the default engine

louis030195 commented 4 weeks ago

how can i test?

cparish312 commented 4 weeks ago

Working on testing now. So right now there is a foreign key constraint on the the audio_transcription table having an audio_chunk_id in the audio_chunks table. How would you like to handle this for adding transcriptions without associated audio_chunks

louis030195 commented 4 weeks ago

@cparish312

Working on testing now. So right now there is a foreign key constraint on the the audio_transcription table having an audio_chunk_id in the audio_chunks table. How would you like to handle this for adding transcriptions without associated audio_chunks

hmm okay so the use case is that you don't have the .mp4 audio recording to share?

like maybe someone is syncing their iphone manual recording and dont have .mp4 or lazy and we want to allow them to sync just the transcription without audio chunk

that's a bit annoying because all the code is based around this, like the search, in the UI we display result of path to video etc

dumb workaround is to generate TTS the chunk using AI XD

what are the possible solutions?

louis030195 commented 4 weeks ago

i guess we have to allow nullable and in the UI not showing any audio chunk

cparish312 commented 4 weeks ago

@louis030195 Okay cool yeah that probably makes the most sense

cparish312 commented 4 weeks ago

Oh man didn't realize how much of a pain a nullable migration is in SQLite

cparish312 commented 4 weeks ago

Yeah this is causing some issues for searching

cparish312 commented 4 weeks ago

The transcription is showing up when hitting the search endpoint, but not seeing it in the UI. Assuming this is because there is no path / audio_chunk_id. How do you want to handle the results in the UI?

cparish312 commented 3 weeks ago

Updated so it just shows "No file path available for this audio." in the app search UI when there is no audio path

cparish312 commented 3 weeks ago

Testing transcription insert: curl -X POST "http://localhost:3030/add" \ -H "Content-Type: application/json" \ -d '{ "device_name": "MacBook Pro Microphone (input)", "content": { "content_type": "transcription", "data": { "transcription": "This is an example transcription of recorded audio.", "transcription_engine": "speech_to_text_v1" } } }'

cparish312 commented 3 weeks ago

Testing frames insert. Will need to change the file_paths to paths that exist on your computer: `curl -X POST "http://localhost:3030/add" \ -H "Content-Type: application/json" \ -d '{'device_name': 'hindsight_android', 'content': {'content_type': 'frames', 'data': [{'file_path': '/Users/connorparish/.hindsight_server/data/raw_screenshots/2024/06/03/com-google-android-deskclock/com-google-android-deskclock_1717433244710.jpg', 'timestamp': '2024-06-03T16:47:24.710000038Z', 'app_name': 'Clock', 'window_name': 'Clock', 'ocr_results': [], 'tags': ['hindsight', 'Clock']}, {'file_path': '/Users/connorparish/.hindsight_server/data/raw_screenshots/2024/06/03/com-google-android-deskclock/com-google-android-deskclock_1717433242624.jpg', 'timestamp': '2024-06-03T16:47:22.624000072Z', 'app_name': 'Clock', 'window_name': 'Clock', 'ocr_results': [], 'tags': ['hindsight', 'Clock']}]}}'

louis030195 commented 3 weeks ago

Testing frames insert. Will need to change the file_paths to paths that exist on your computer: `curl -X POST "http://localhost:3030/add" \ -H "Content-Type: application/json" \ -d '{'device_name': 'hindsight_android', 'content': {'content_type': 'frames', 'data': [{'file_path': '/Users/connorparish/.hindsight_server/data/raw_screenshots/2024/06/03/com-google-android-deskclock/com-google-android-deskclock_1717433244710.jpg', 'timestamp': '2024-06-03T16:47:24.710000038Z', 'app_name': 'Clock', 'window_name': 'Clock', 'ocr_results': [], 'tags': ['hindsight', 'Clock']}, {'file_path': '/Users/connorparish/.hindsight_server/data/raw_screenshots/2024/06/03/com-google-android-deskclock/com-google-android-deskclock_1717433242624.jpg', 'timestamp': '2024-06-03T16:47:22.624000072Z', 'app_name': 'Clock', 'window_name': 'Clock', 'ocr_results': [], 'tags': ['hindsight', 'Clock']}]}}'

will try today

louis030195 commented 3 weeks ago

audio


curl -X POST "http://localhost:3035/add" -H "Content-Type: application/json" -d '{
  "device_name": "MacBook Pro Microphone (input)",
  "content": {
    "content_type": "transcription",
    "data": {
      "transcription": "This is an example transcription of recorded audio.",
      "transcription_engine": "speech_to_text_v1"
    }
  }
}' | jq

curl -X GET "http://localhost:3035/search?q=example&content_type=audio" -H "Content-Type: application/json" | jq
{
  "data": [
    {
      "type": "Audio",
      "content": {
        "chunk_id": 8,
        "transcription": "This is an example transcription of recorded audio.",
        "timestamp": "2024-10-29T21:34:06.615182Z",
        "file_path": "",
        "offset_index": -1,
        "tags": [],
        "device_name": "MacBook Pro Microphone (input)",
        "device_type": "Input"
      }
    }
  ],
  "pagination": {
    "limit": 20,
    "offset": 0,
    "total": 1
  }
}

frames

curl -X POST "http://localhost:3035/add" -H "Content-Type: application/json" -d '{
  "device_name": "macbook_pro",
  "content": {
    "content_type": "frames",
    "data": [
      {
        "file_path": "'$HOME'/Library/Mobile Documents/com~apple~CloudDocs/Desktop/Screenshots/02722091-76A7-4215-9CAB-E4A4DC5A37BA.png",
        "timestamp": "2024-03-14T16:47:24.710Z",
        "app_name": "Desktop",
        "window_name": "Screenshot",
        "ocr_results": [],
        "tags": ["screenshot", "desktop"]
      },
      {
        "file_path": "'$HOME'/Library/Mobile Documents/com~apple~CloudDocs/Desktop/Screenshots/0D7F899B-DE6B-494E-B70D-1F5338A54AEE.png",
        "timestamp": "2024-03-14T16:47:22.624Z",
        "app_name": "Desktop",
        "window_name": "Screenshot",
        "ocr_results": [],
        "tags": ["screenshot", "desktop"]
      }
    ]
  }
}' | jq

curl -X GET "http://localhost:3035/search?window_name=screenshot&content_type=ocr&limit=1000" -H "Content-Type: application/json" | jq
{
  "success": true,
  "message": "Frames added successfully"
}

{
  "data": [],
  "pagination": {
    "limit": 1000,
    "offset": 0,
    "total": 0
  }
}

not sure i did a mistake or, expected to get the frame here

also nto seeing the merged video

(env) (base) louisbeaumont@mac:~/Documents/screen-pipe$ ls /tmp/sp/data/
Display 1 (output)_2024-10-29_21-27-01.mp4              monitor_1_2024-10-29_21-31-45.mp4                       monitor_1_2024-10-29_21-38-39.mp4
MacBook Pro Microphone (input)_2024-10-29_21-27-14.mp4  monitor_1_2024-10-29_21-32-53.mp4                       monitor_1_2024-10-29_21-39-49.mp4
macbook_pro_2024-10-29_21-40-21.mp4                     monitor_1_2024-10-29_21-34-22.mp4                       monitor_1_2024-10-29_21-41-04.mp4
macbook_pro_2024-10-29_21-43-37.mp4                     monitor_1_2024-10-29_21-35-34.mp4                       monitor_1_2024-10-29_21-42-15.mp4
monitor_1_2024-10-29_21-26-44.mp4                       monitor_1_2024-10-29_21-37-28.mp4

also don't you have OCR?

i assume this API might be used in a very broad range of use case so should be flexible for example:

for the scope of this PR we can stick to the minimum i think, not much post processing

cparish312 commented 3 weeks ago

Yeah I agree running OCR by default when OCR results are not provided would be ideal but sounds good to add in another PR.

Are the macbook_pro videos not the merged videos? I'm storing by "{devicename}{current_time}.mp4"

Maybe they aren't appearing in the search since there are no ocr results? Could you try putting in OCR results.


curl -X POST "http://localhost:3035/add" -H "Content-Type: application/json" -d '{
  "device_name": "macbook_pro",
  "content": {
    "content_type": "frames",
    "data": [
      {
        "file_path": "'$HOME'/Library/Mobile Documents/com~apple~CloudDocs/Desktop/Screenshots/02722091-76A7-4215-9CAB-E4A4DC5A37BA.png",
        "timestamp": "2024-03-14T16:47:24.710Z",
        "app_name": "Desktop",
        "window_name": "Screenshot",
        "ocr_results": [{'text': 'test add frames with ocr results',
                                'text_json': '{}',
                                'ocr_engine': 'apple_native'}],
        "tags": ["screenshot", "desktop"]
      },
      {
        "file_path": "'$HOME'/Library/Mobile Documents/com~apple~CloudDocs/Desktop/Screenshots/0D7F899B-DE6B-494E-B70D-1F5338A54AEE.png",
        "timestamp": "2024-03-14T16:47:22.624Z",
        "app_name": "Desktop",
        "window_name": "Screenshot",
        "ocr_results":  [{'text': 'test add frames with ocr results 2',
                                'text_json': '{}',
                                'ocr_engine': 'apple_native'}],
        "tags": ["screenshot", "desktop"]
      }
    ]
  }
}' | jq
louis030195 commented 3 weeks ago

ref: https://github.com/BasedHardware/omi/issues/1212

louis030195 commented 3 weeks ago

works!


{
  "data": [
    {
      "type": "OCR",
      "content": {
        "frame_id": 1,
        "text": "test add frames with ocr results",
        "timestamp": "2024-03-14T16:47:24.710Z",
        "file_path": "/tmp/spp/data/macbook_pro_2024-10-29_23-25-31.mp4",
        "offset_index": 0,
        "app_name": "Desktop",
        "window_name": "Screenshot",
        "tags": [
          "screenshot",
          "desktop"
        ],
        "frame": null
      }
    },
    {
      "type": "OCR",
      "content": {
        "frame_id": 2,
        "text": "test add frames with ocr results 2",
        "timestamp": "2024-03-14T16:47:22.624Z",
        "file_path": "/tmp/spp/data/macbook_pro_2024-10-29_23-25-31.mp4",
        "offset_index": 1,
        "app_name": "Desktop",
        "window_name": "Screenshot",
        "tags": [
          "screenshot",
          "desktop"
        ],
        "frame": null
      }
    }
  ],
  "pagination": {
    "limit": 1000,
    "offset": 0,
    "total": 2
  }
}
louis030195 commented 3 weeks ago

@cparish312 should i merge now?

cparish312 commented 3 weeks ago

@louis030195 Did some final cleanups should be good to go!

louis030195 commented 3 weeks ago

/approve

thx!

one use case i'd want to try (would need to add a OCR option) is to create an apple shortcut to add a document into screenpipe, maybe a pdf converted to image

algora-pbc[bot] commented 3 weeks ago

@louis030195: The claim has been successfully added to reward-all. You can visit your dashboard to complete the payment.