saltyorg / docs

GNU General Public License v3.0
53 stars 75 forks source link

Documentation Request: Paperless NGX + with document store on google #116

Closed cstrand89 closed 1 year ago

cstrand89 commented 2 years ago

A community-supported supercharged version of paperless: scan, index and archive all your physical documents.

Example docker compose: https://github.com/paperless-ngx/paperless-ngx/blob/main/docker/compose/docker-compose.postgres-tika.yml

the-jchusid commented 2 years ago

Here is the compose file and env file I been using for the last week while testing.

Compose File - Hastebin ENV File - Hastebin

cstrand89 commented 2 years ago

Where do I place this these files @jchus-id do you have a quick quide?

JigSawFr commented 2 years ago

Working on. :) Almost ready!.

maximuskowalski commented 2 years ago

Merged in saltyorg/Sandbox#117

URLS for upstream and docker information should be included in the documentation for each role.

kungfoome commented 2 years ago

Been playing with this to see what the best setup would be. I have some suggestions.

  1. remove config. Its not used for the docker image and need to use envs https://paperless-ngx.readthedocs.io/en/latest/configuration.html
  2. Maybe make PAPERLESS_MEDIA_ROOT a setting. I personally don't want this in the opt folder and i think other people would probably think the same. Instead, it would probably be somewhere on the /mnt directory.
  3. Same for PAPERLESS_CONSUMPTION_DIR. This could probably be anywhere, but i am experimenting with having a consume directory on my google drive. Maybe this just can just be left as-is though.

Keep the database in opt, but point to the media folder of choice is what it comes down to.

cstrand89 commented 1 year ago

Been playing with this to see what the best setup would be. I have some suggestions.

1. remove config. Its not used for the docker image and need to use envs https://paperless-ngx.readthedocs.io/en/latest/configuration.html

2. Maybe make `PAPERLESS_MEDIA_ROOT` a setting. I personally don't want this in the opt folder and i think other people would probably think the same. Instead, it would probably be somewhere on the /mnt directory.

3. Same for `PAPERLESS_CONSUMPTION_DIR`. This could probably be anywhere, but i am experimenting with having a consume directory on my google drive. Maybe this just can just be left as-is though.

Keep the database in opt, but point to the media folder of choice is what it comes down to.

Has this been implemented? I am hesitant to utilise it until the option to store files in /mnt works.

JigSawFr commented 1 year ago

Will take a look asap

kungfoome commented 1 year ago

Been playing with this to see what the best setup would be. I have some suggestions.

1. remove config. Its not used for the docker image and need to use envs https://paperless-ngx.readthedocs.io/en/latest/configuration.html

2. Maybe make `PAPERLESS_MEDIA_ROOT` a setting. I personally don't want this in the opt folder and i think other people would probably think the same. Instead, it would probably be somewhere on the /mnt directory.

3. Same for `PAPERLESS_CONSUMPTION_DIR`. This could probably be anywhere, but i am experimenting with having a consume directory on my google drive. Maybe this just can just be left as-is though.

Keep the database in opt, but point to the media folder of choice is what it comes down to.

Has this been implemented? I am hesitant to utilise it until the option to store files in /mnt works.

You can do this today with the way saltbox is configured. It's not done by default for this. I can follow-up with my findings in a bit and show my config.

JigSawFr commented 1 year ago

Of course you can still override env specify custom path, its working out of the box with saltbox ;)

kungfoome commented 1 year ago

This my current config:

/srv/git/saltbox/inventories/host_vars/localhost.yml

paperless_ngx_docker_envs_custom:
  PAPERLESS_CONSUMPTION_DIR: /mnt/unionfs/Documents/consume
  PAPERLESS_MEDIA_ROOT: /mnt/protected/unionfs/Documents/paperless
  PAPERLESS_CONSUMER_POLLING: "5"
  PAPERLESS_TASK_WORKERS: "4"
  PAPERLESS_THREADS_PER_WORKER: "4"
  PAPERLESS_FILENAME_FORMAT: "{created_year}/{correspondent}/{created_year}-{created_month}-{created_day}_{title} ({document_type}) [{tag_list}]"

Couple things to note here. I wanted to have an encrypted folder using rclone. So, I have my mount point under /mnt/protected. You can definitely put i under /mnt/unionfs/Media/Documents for example if you wanted to and not have to do much. The other thing i wanted to do is sync with smaller files. So, I added another folder in cloudplow.

/opt/cloudplow/config.json

Under remotes, I added:

"remotes": {
  "protected_documents": {
      "hidden_remote": "protected:",
      "rclone_command": "move",
      "rclone_excludes": [
        "**partial~",
        "**_HIDDEN~",
        "*.db",
        "media.lock"
      ],
      "rclone_extras": {
        "--checkers": 16,
        "--drive-chunk-size": "1M",
        "--drive-stop-on-upload-limit": null,
        "--low-level-retries": 2,
        "--retries": 1,
        "--skip-links": null,
        "--stats": "60s",
        "--transfers": 8,
        "--user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36",
        "--verbose": 1
      },
      "rclone_sleeps": {
        " 0/s,": {
          "count": 16,
          "sleep": 25,
          "timeout": 62
        },
        "Failed to copy: googleapi: Error 403: User rate limit exceeded": {
          "count": 10,
          "sleep": 25,
          "timeout": 7200
        }
      },
      "remove_empty_dir_depth": 2,
      "sync_remote": "protected:/Documents",
      "upload_folder": "/mnt/protected/local/Documents",
      "upload_remote": "protected:/Documents"
    }
}

Under uploader section, I added:

"uploader": {
  "protected_documents": {
      "check_interval": 1,
      "exclude_open_files": false,
      "max_size_gb": 0,
      "opened_excludes": [],
      "service_account_path": "",
      "size_excludes": []
    }
}

Again, if you put it under Media, can do google:/Media instead of protected:/Documents. So far it works pretty well. I can add files into the consume folder. This can be locally on the box or if you use google drive, a folder in google drive. Just drop it in there and consumes it. I've noticed sometimes it gets stuck, so ill restart the service and then it works ok after. Loading documents is a bit slow to load thumbnails, but not too bad. Tagging is also a bit laggy, but not big deal for me.

The one thing doesn't seem to work is filesystem stats for consuming. I just added PAPERLESS_CONSUMER_POLLING and it works fine.

If you want to know how I have the encrypted drive setup, I can go through that as well. Short version is, I just set that up manually with rclone. Copied the rclone systemd file to mount it and copied the mergerfs systemd file to create the unionfs directory.

RaneyDazed commented 1 year ago

@maximuskowalski Would you like the google bits included in docs?

RaneyDazed commented 1 year ago

Pull #117

maximuskowalski commented 1 year ago

@maximuskowalski Would you like the google bits included in docs?

In general, I am aiming to add the minimum amount of information needed to get the role installed and supply links to the official documentation if it exists. If there is other information or a short story that I want to include I might add a section after the more or less standard template ( I think you picked up on one of my Find Replace APPNAME mistakes :) ). Extra info and tips are great but not important. Usually if I am doing docs I have a bunch to do so I'm just trying to get something in place.