omnivore-app / omnivore

Omnivore is a complete, open source read-it-later solution for people who like reading.
https://omnivore.app
GNU Affero General Public License v3.0
13.53k stars 845 forks source link

Improve support for users that want to deploy their own backened #25

Open jacksonh opened 2 years ago

jacksonh commented 2 years ago

Currently Omnivore relies on a few GCP services to run, but open source users will likely want to deploy the api, web, and content fetching (puppeteer-parse) service to another platform. We need to come up with a list of target platforms and supported deployment configurations that are realistic for users wanting to deploy a minimalistic configuration.

Some of the services we currently rely on:

Other services we are using:

Limezy commented 2 years ago

Hi, thanks for trying to make Omnivore self-hostable ! For the CloudStorage part, could it be replaced by a MinIO instance with minor modifications on your side ? For the SendGrid I guess a simple SMTP connector would be sufficient ? For the auth process, if you make your app compatible with passport.js would be very easy for the community to then add connectors as per their needs. I would recommand OIDC as the primary choice.

trashhalo commented 2 years ago

Google built this go library that abstracts gcp services and has plugs for aws, azure, etc. https://gocloud.dev/

I wonder if a similar library exists for nodejs. 🤔

menelic commented 2 years ago

Thanks for a great tool - making this a nextcloud app would make this easy to deploy for many non technical users who have access to nextcloud instances. The app fits well with the open nextcloud ecossystem and meet a need not adressed by any nextcoud app.

Nevarro commented 1 year ago

Could we get a progress update? Omnivore is exactly what I am looking for – only production-level self-hosting is missing.

Coo-ops commented 1 year ago

Please update to fully open source. Thank you.

jacksonh commented 1 year ago

Please update to fully open source. Thank you.

Hi @Coo-ops the only piece that isn't open source is a PDF viewer library that we license, in the future we will try to replace this with pdf.js.

obvionaoe commented 1 year ago

any new progress on this issue?

r0bbie commented 11 months ago

I really want to go all-in on Omnivore. Coming across from Wallabag (and tried out a load of other options) the UX seems great, and happy to see it be open source! So been keeping an eye on this - right now both the lack of ability to self-host (or at least do so easily!) paired with the lack of any data export function (locking you in) makes me really hesitant..

If data export existed I'd at least be more willing to go with your hosted version for now, knowing it'll either be easy to migrate over when self-hosting is properly available (or to migrate my data to another solution altogether in the event self-hosting failed to materialise).

Wondered if there are any updates on this at present?

stanthewizzard commented 11 months ago

deployment with docker won't do the trick ? I want omnivore in house

jerryzhang721 commented 11 months ago

Is there a detailed tutorial for docker self-hosting?

r0bbie commented 11 months ago

@jerryzhang721 All I've been able to find are the extremely basic instructions in the readme (https://github.com/omnivore-app/omnivore#how-to-setup-local-development-computer), but I was simply unable to get this working when using a custom domain rather than local IP/port. And I'm still extremely unclear on if all the external cloud dependencies have been refactored out yet allowing proper self-hosting or not..

axelson commented 11 months ago

I'm very much hoping to self-host Omnivore as well!

I didn't see these docs posted in this issue so I'll post them here:

grapemix commented 11 months ago

For the record, @lawrencegripper did contribute on k8s setup in https://github.com/omnivore-app/omnivore/pull/2966. Unfortunately, it is in WIP and he is unavailable.

se-jaeger commented 11 months ago

For the record, @lawrencegripper did contribute on k8s setup in #2966. Unfortunately, it is in WIP and he is unavailable.

Thanks for this pointer! I'm currently thinking about/planning to work on a Helm chart. Will probably start over the Christmas days.

Feel free to ping me or connect if you would like to support.

grapemix commented 10 months ago

For the record, @lawrencegripper did contribute on k8s setup in #2966. Unfortunately, it is in WIP and he is unavailable.

Thanks for this pointer! I'm currently thinking about/planning to work on a Helm chart. Will probably start over the Christmas days.

Feel free to ping me or connect if you would like to support.

FYI, not sure if you heard about https://bjw-s.github.io/helm-charts/docs/app-template/, it is pretty popular and it probably can save you quite a lot of time in this case. Also, lots of homelab users already have deployed cloudnative-pg or something equivalent, script to bootstrap PG is likely not needed to them.

se-jaeger commented 10 months ago

For the record, @lawrencegripper did contribute on k8s setup in #2966. Unfortunately, it is in WIP and he is unavailable.

Thanks for this pointer! I'm currently thinking about/planning to work on a Helm chart. Will probably start over the Christmas days.

Feel free to ping me or connect if you would like to support.

Hi, sorry for the late response there were some hurdles I had to overcome.

As @grapemix suggested, I use the bjw-s helm chart to setup a functioning instance (Web, API, content-fetch). You can found the current (WIP) version here: https://github.com/se-jaeger/omnivore

There are some things I want to improve. However, in the meantime, I'd love to get feedback from you:

One more remark. I built and pushed the images to my Docker Hub account: https://hub.docker.com/u/sejaeger Working on #3177 would definitively improve the chart.

se-jaeger commented 9 months ago

FYI: already merged #3385

Nevarro commented 9 months ago

Any news on docker self-hosting?

mbhkoay commented 9 months ago

Sharing my experience attempting to self-host this, not a coder at all so trying to fix some things is out of my expertise. I self-host stuff on my unraid machine, so your mileage may vary.

  1. Git clone git clone https://github.com/omnivore-app/omnivore

  2. Change directory cd omnivore

  3. Adjust docker-compose a. added lines to create docker custom network. b. replace all secrets and environments required. Refer docker compose file. c. I think I messed some things up while updating the postgres password, so I ended up not changing them.

  4. Reverse proxy for port 3000 & 4000 (or the changed port number)

  5. Start docker docker compose up --detach

  6. Outstanding items & observations a. Readme says to save pages puppeteer-parse is required outside of docker - Not sure if this can be dockerised b. by default the login is demo@omnivore.app, password: demo_password. Have not found any settings to change password from within the app. I guess you can click forget password and reset it via email, but I haven't tried and think that it is likely that it won't work. c. No way to disable the signup button yet d. content-fetch is not working, it throws an error about redisURL not supplied. Attempted to throw in a redis container to see if it works. Apparently not. You can see the additional section in the docker compose. e. se-jaeger's contribution using helm chart is something that I have yet to explore. (No experience at all with helm/Kubernetes etc.) He mentions the requirement of elastic-search, as well as an alternative to handle RSS Subscriptions.

version: '3'
services:
  postgres:
    image: "ankane/pgvector:v0.5.1"
    container_name: "omnivore-postgres"
    environment:
      - POSTGRES_USER=postgres #nochange?
      - POSTGRES_PASSWORD=postgres #nochange?
      - POSTGRES_DB=omnivore
      - PG_POOL_MAX=20
    healthcheck:
      test: "exit 0"
      interval: 2s
      timeout: 12s
      retries: 3
    expose:
      - 5432 #change-likely-not-required from 5432
    networks: #custom docker network
      - omnivore #custom docker network

  migrate:
    build:
      context: .
      dockerfile: ./packages/db/Dockerfile
    container_name: "omnivore-migrate"
    command: '/bin/sh ./packages/db/setup.sh' # Also create a demo user with email: demo@omnivore.app, password: demo_password
    environment:
      - PGPASSWORD=postgres #nochange?
      - POSTGRES_USER=postgres #nochange?
      - PG_HOST=postgres
      - PG_PASSWORD=app_pass #changeme-postgres-app-pass
      - PG_DB=omnivore
    depends_on:
      postgres:
        condition: service_healthy
    networks: #custom docker network
      - omnivore #custom docker network

  api:
    build:
      context: .
      dockerfile: ./packages/api/Dockerfile
    container_name: "omnivore-api"
    ports:
      - "4000:8080"
    healthcheck:
      test: ["CMD-SHELL", "nc -z 0.0.0.0 8080 || exit 1"]
      interval: 15s
      timeout: 90s
    environment:
      - API_ENV=local
      - PG_HOST=postgres
      - PG_USER=app_user #changeme-postgres-app-user
      - PG_PASSWORD=app_pass #changeme-postgres-app-pass
      - PG_DB=omnivore
      - PG_PORT=5432 #change-likely-not-required from 5432
      - PG_POOL_MAX=20
      - JAEGER_HOST=jaeger
      - IMAGE_PROXY_SECRET=aaaaaaaaaaaaaaaaaaa #changeme
      - JWT_SECRET=bbbbbbbbbbbbbbbbbbbbbbbbbbb #changemejwt
      - SSO_JWT_SECRET=ccccccccccccccccccccccc #changeme
      - CLIENT_URL=https://omnivore.some.domain #change-port-3000-if-required
      - GATEWAY_URL=https://api.omnivore.some.domain/api #not sure if need to change? originally http://localhost:8080/api
      - CONTENT_FETCH_URL=http://content-fetch:8080/?token=dddddddddddddddddddddddd #changemetoken
    depends_on:
      migrate:
        condition: service_completed_successfully
    networks: #custom docker network
      - omnivore #custom docker network

  web:
    build:
      context: .
      dockerfile: ./packages/web/Dockerfile
      args:
        - APP_ENV=prod
        - BASE_URL=https://omnivore.some.domain #changeme-domain-url e.g. https://omnivore.domain.com
        - SERVER_BASE_URL=https://api.omnivore.some.domain #changeme-api-server-domain-url e.g. https://api.omnivore.domain.com
        - HIGHLIGHTS_BASE_URL=https://omnivore.some.domain #changeme-domain-url e.g. https://omnivore.domain.com
    container_name: "omnivore-web"
    ports:
      - "3001:8080" #change-port-3000-if-required
    environment:
      - NEXT_PUBLIC_APP_ENV=prod
      - NEXT_PUBLIC_BASE_URL=https://omnivore.some.domain #changeme-domain-url e.g. https://omnivore.domain.com
      - NEXT_PUBLIC_SERVER_BASE_URL=https://api.omnivore.some.domain #changeme-api-server-domain-url e.g. https://api.omnivore.domain.com
      - NEXT_PUBLIC_HIGHLIGHTS_BASE_URL=https://omnivore.some.domain #changeme-domain-url e.g. https://omnivore.domain.com
    depends_on:
      api:
        condition: service_healthy
    networks: #custom docker network
      - omnivore #custom docker network

  content-fetch:
    build:
      context: .
      dockerfile: ./packages/content-fetch/Dockerfile
    container_name: "omnivore-content-fetch"
    ports:
      - "9090:8080"
    environment:
      - JWT_SECRET=bbbbbbbbbbbbbbbbbbbbbbbbbbb #changemejwt
      - VERIFICATION_TOKEN=dddddddddddddddddddddddd #changemetoken
      - REST_BACKEND_ENDPOINT=https://api.omnivore.some.domain/api #not sure if need to change? originally http://api:8080/api
#      - REDISURL=redis://omnivore-redis:6379 #redis
    depends_on:
      api:
        condition: service_healthy
    networks: #custom docker network
      - omnivore #custom docker network

#################   redis   ###############
#  omnivore-redis:
#    image: redis:latest
#    container_name: omnivore-redis
#    environment:
#      - TZ=Asia/Kuala_Lumpur
#    restart: always
#    networks:
#      - omnivore
#################   redis   ###############

networks: #custom docker network
  omnivore: #custom docker network
se-jaeger commented 9 months ago

Hi @mbhkoay,

thanks for this write up! Here are some pointers that may help.

3.c. I think I messed some things up while updating the postgres password, so I ended up not changing them.

In the https://github.com/omnivore-app/omnivore/blob/main/self-hosting/helm/values.yaml file, I added some hard coded credentials (PG_DB, PG_USER) that are also hard coded in the code base, which is why can't change them easily.

6.b. by default the login is demo@omnivore.app, password: demo_password.

I added a environment variable that allows to turn-off the creation of this default user: NO_DEMO_USER=1. However, if you register a new one, make sure to follow the steps documented here to verify it. (Normally, you would get an email that asks to click a link)

6.d. content-fetch is not working, it throws an error about redisURL not supplied. Attempted to throw in a redis container to see if it works. Apparently not. You can see the additional section in the docker compose.

Also stumbled across this. If you rollback to this comment (e44616b01), which is the latest before redis is required for content-fetch, it should be possible to run it.

I plan to dive into these changes and propose a solution for self-hosted instances.

Hope it helps. Cheers.

mariusrugan commented 8 months ago

Hi @mbhkoay and @se-jaeger, thanks both for your contributions,

is still unclear to me if elastic is needed.
looking at the docker-compose from the projects' root, i just see pgvector (postgres+pgvector).

thanks in advance!

jacksonh commented 8 months ago

Hey @mariusrugan we actually just dropped the elastic requirement recently, we're also in the middle of pulling out most of the GCP requirements and getting things down to two images (backend which will both process async jobs and run the API, and content-fetch which is the standalone service for fetching page content).

its in a bit of flux right now though as we wrap up this work.

jacksonh commented 8 months ago

Docker images available here: https://github.com/orgs/omnivore-app/packages?repo_name=omnivore

jacksonh commented 8 months ago

For the record, @lawrencegripper did contribute on k8s setup in #2966. Unfortunately, it is in WIP and he is unavailable.

Thanks for this pointer! I'm currently thinking about/planning to work on a Helm chart. Will probably start over the Christmas days. Feel free to ping me or connect if you would like to support.

Hi, sorry for the late response there were some hurdles I had to overcome.

As @grapemix suggested, I use the bjw-s helm chart to setup a functioning instance (Web, API, content-fetch). You can found the current (WIP) version here: https://github.com/se-jaeger/omnivore

There are some things I want to improve. However, in the meantime, I'd love to get feedback from you:

  • documentation
  • health checks
  • RSS

One more remark. I built and pushed the images to my Docker Hub account: https://hub.docker.com/u/sejaeger Working on #3177 would definitively improve the chart.

I think a lot of this is improved with our move to bullmq jobs instead of cloud functions. The backend service has health checks for both the api server and the queue-processor server that also handle graceful shutdown via SIGTERM. We've started running both in k8s for our services as well.

stanthewizzard commented 8 months ago

When using docker, is the extension for chrome able to connect to it ? Thanks

jacksonh commented 8 months ago

You'd have to build the extension yourself. For security, the extension includes a content security policy that specifies the domains it can connect to.

stanthewizzard commented 8 months ago

would be awesome to have settings for that to bypass default :) thanks

jacksonh commented 8 months ago

would be awesome to have settings for that to bypass default :) thanks

I think from a security perspective its very unlikely we'd do this.

stanthewizzard commented 8 months ago

To be quit honnest it's strange. I use wallabag and there are settings for selfhosted instance. Same for obsidian. and on and on :)

jacksonh commented 8 months ago

OK thanks for your feedback

sibbl commented 8 months ago

I think from a security perspective its very unlikely we'd do this.

Could you please provide some insight into why this is not an option for this project?

Even the Bitwarden extension allows configuring a custom backend endpoint, and I would expect that password managers must adhere to higher security standards than this extension here.

It would be helpful for us to understand the reasoning behind this decision. I've been following this project very closely for a while and plan to use and support it, as soon as self-hosting is an easy and well supported option.

mariusrugan commented 8 months ago

I second @sibbl , even more , Bitwarden official extension works with Vaultwarden backend (open source)

Also I've loosely explored Shiori, https://github.com/go-shiori/shiori-web-ext, which also has self-hosting capabilities ootb.

EDIT linkding too :) https://github.com/sissbruecker/linkding

I'm here for the iOS app/integration which i've researched and looks great but and i can load an unpacked & hacked extension in chrome, but it's not for the average person.

stanthewizzard commented 8 months ago

I second @sibbl , even more , Bitwarden official extension works with Vaultwarden backend (open source)

Also I've loosely explored Shiori, https://github.com/go-shiori/shiori-web-ext, which also has self-hosting capabilities ootb.

EDIT linkding too :) https://github.com/sissbruecker/linkding

I'm here for the iOS app/integration which i've researched and looks great but and i can load an unpacked & hacked extension in chrome, but it's not for the average person.

And I'm using it too :))

jtsang4 commented 8 months ago

This project is under AGPL-3.0 license, is it acceptable to modify source code then self-host it without publishing modified code?

yes

jacksonh commented 8 months ago

Sure, I'll create a project for this on OpenCollective. If funded will be happy to create a separate version of the extension without a content security policy.

jacksonh commented 8 months ago

, but it's not for the average person.

I suspect the average person should not be self hosting.

jacksonh commented 8 months ago

OK, here you go. If we can fund this project we will create a separate version of the extension for people with self hosted backends: https://opencollective.com/omnivore/projects/extension-updates-for-self-hos

mariusrugan commented 8 months ago

thanks for availability @jacksonh

Screenshot 2024-02-22 at 17 53 13

i don't understand what would be the problem with the CSP. Can't be configurable ? It's a header.

jacksonh commented 8 months ago

i don't understand what would be the problem with the CSP. Can't be configurable ? It's a header.

The CSP is part of the extension manifest. If you look at bitwarden's CSP they just allow pretty much everything. I'm happy to create a separate version of the extension for self hosters but removing the CSP from the main extension seems like its sacrificing security for the majority (most users do not self host) for the convenience of the minority.

r0bbie commented 8 months ago

Just to note on the discussion regarding the browser extension, the https://github.com/herrherrmann/omnivore-list-popup extension is really excellent, in my opinion the best Omnivore extension currently available in terms of functionality and UX, and that extension does allow an Omnivore instance to be set for compatibility with self-hosted installs (https://github.com/herrherrmann/omnivore-list-popup/issues/23).

Just as another option in case helpful for anyone while discussions regarding the official extension and self-hosting compatibility are ongoing :)

mariusrugan commented 8 months ago

thanks a lot @r0bbie ! much appreciated !

solidstudio commented 8 months ago

Docker images available here: https://github.com/orgs/omnivore-app/packages?repo_name=omnivore

Anyone got a docker-compose file setup using the new images?

grapemix commented 8 months ago

Thanks to mbhkoay's work. I give a try on deploying omnivore in my k8s + fluxcd cluster.

Here is a few feedback:

  1. web docker img is missing
  2. the API pod has to be run as root. That's a security concern.
  3. We have to append the VERIFICATION_TOKEN in CONTENT_FETCH_URL. That's not really k8s nor flux friendly. String concatenation is very hard on the k8s and flux side. String concatenation is much easier to be done in the server itself.
  4. The liveness and readiness probes are not working. I used the common /v1/health path. I currently disable them.
  5. Haven't tried the SSO_JWT_SECRET since web image is missing
  6. Where do we use IMAGE_PROXY_SECRET?

I cannot fully test my deployment since the web img is missing. But I can see the api page.

Finally, thanks all for the hardwork. It is a good start.

jacksonh commented 8 months ago

Thanks, this is what i use for our probes:

        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /_ah/health
            port: 8080
            scheme: HTTP

regarding the web image currently you need to build it, because of the way next.js works you have to "bake" the environment variables like the API endpoint into the javascript bundle. There's some workarounds with replacing them in the docker image that I'd like to try but haven't had time yet.

jacksonh commented 8 months ago

@grapemix can i ask how you are managing your secrets? Are you able to just store the entire CONTENT_FETCH_URL as a secret? Also i am wondering if we can remove that token in self hosted scenarios anyways.

grapemix commented 8 months ago
  1. @jacksonh, I am using external secret. I have to store the entire CONTENT_FETCH_URL as a secret entry, but this solution has two sources of truth for the VERIFICATION_TOKEN which increase the likelihood of operational mal-configuration.
  2. I've tested the livenessProbe URL in api and fetch img. Both of them works. thx.
  3. Thanks for explaining the web img's status to us. Looking forward for the new version. For now, I've built a web img. It looks like it works, but I find new problems.
  4. It turns out we really need to run the migrate DB img or there will be no tables. First, the DB migration img is missing, so I build my own and try to run it.
  5. But the DB migration img needs additional permissions which is less k8s friendly and increase security risk. Lots of k8s clusters use cloudnative-pg, Crunchy Data Postgres Operator, Zalando Postgres Operator or some PG variants from the cloudprovider. Those DB providers already have easy ways to setup a dedicated DB account which won't provide create DB and grant permission for security reason. Anyway, I spend some time to grant the DB account w/ additional create db and grant role permission because I saw the code have hardcoded "app_user" for some unknown reasons.
  6. Finally, it seems the DB migration img has enough permission, but the log shows something wrong.
ERROR:  database "omnivore" already exists
create omnivore database
CREATE ROLE
created app_user
yarn workspace v1.22.19
yarn run v1.22.19
$ ts-node ./migrate.ts
> Starting migration manager
> Migrating to latest.
> Postgres migration failed: extension "uuid-ossp" already exists
> No Postgres migrations applied.
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
error Command failed.
Exit code: 1
Command: /usr/local/bin/node
Arguments: /opt/yarn-v1.22.19/lib/cli.js migrate
Directory: /app/packages/db
Output:
info Visit https://yarnpkg.com/en/docs/cli/workspace for documentation about this command.
ERROR:  role "omnivore_user" does not exist
granted omnivore_user to app_user
ERROR:  relation "omnivore.user" does not exist
LINE 1: INSERT INTO omnivore.user (id, source, email, source_user_id...
                    ^
created demo user with email: demo@omnivore.app, password: demo_password
stream closed
philipp-koch commented 6 months ago

I think it would be a fantastic possibility for user to have their own Omnivore instance on their hardware running, thus controlling their own data! I run a small number of services in docker containers on my Synology NAS, and I'm sure that this is quite a common way for people like me who don't know enough how (or simply don't want to) to run a fully fledged web server to self-host stuff.

If a way (and how-tp steps for getting it set up) to accomplish running Omnivore (front and backend) in docker containers, would there be a way to export / migrate my existing data (which right now is on https://Omnivore.app) to the new instance?

If I could run the backend myself, I'd gladly support the creation of the "self-hosting-capable extension variant" financially.

domonnss commented 6 months ago

I have tried self hosting.But,to be honestly,it's not easy to be deployed on my VM.I'll wait and try the updated deployment method.😥😥😥

Mikilio commented 5 months ago

Regarding Cloud Storage: What is the currently envisioned solution? Has WebDAV been considered?

maa-x commented 4 months ago

@Mikilio I've been working on making GCS optional, you can track progress here.

So far, I've got puppeteer (content-fetch), YouTube AI transcripts and PDF working (which fits basically all my own usecases).

If anyone is keen to help please let me know. This is rather outside my skillset so if you see anything looking a bit off do tell me.

Oh and design wise, I was thinking of switching my custom storage abstraction for this one: https://flystorage.dev/ which would allow using a variety of storage backends.