the-paperless-project / paperless

Scan, index, and archive all of your paper documents
GNU General Public License v3.0
7.85k stars 498 forks source link

I implemented a document uploader... and... #196

Open philippeowagner opened 7 years ago

philippeowagner commented 7 years ago

I implemented a document uploader in my branch of paperless using dropzone.js. The documents are uploaded directly to the inbox and the consumer will process them... It works great ;-)

Maybe we should define a way how to add plugins and extensions to paperless. Adding everything to the "core" by sending pull requests is not the best way to go from my point of view - even it's temping... What do you think @danielquinn ?

I'd like to invest in the following areas during the next months:

danielquinn commented 7 years ago

I love the idea of a pluggable architecture, though I think it may be rather difficult to write a generic plugin system that works for a custom uploader as well as a UI and document consumer. I'll have to give it a think, but I like where you're going with this.

At least at the consumer level, it should be easy enough to have some code that looks like this:

for consumer in CONSUMERS:
    if consumer.can_handle(document):
        consumer.consume()

CONSUMERS could be compiled at server start time from a list of Django-apps listening for document.signals.i_am_a_consumer or something like that, so all you'd need to do would be something like:

pip install paperless-markdown-consumer

...and you'd be ready to go. Oh, and we'd have to work in some way to allow the user to add the app to INSTALLED_APPS but that can be done with an environment variable.

For the UI, I don't think much is required since all you'd have to do is write a Django app and then inject that app into INSTALLED_APPS ahead of the documents app. I should point out that I too am no fan of using the Django admin as the primary interface (after all, it was never intended to be user-facing) but frankly, I lack the skills to do a UI properly.

Theoretically, the existing DRF-based API should be all one needs to get a proper UI going, so at that point you'd only need to steal this line and this class from my now-abandoned UI branch to get everything wired up.

Regarding the uploader, I'm not sure why you didn't just use the existing upload form we have. Here's the view and the form. The existing system isn't perfect, but it does account for correspondent, tags, and has some security involved to prevent people from just posting stuff without authorisation. I don't much like the shared-secret trick I'm using there, but at the very least I'd like that to be behind a LoginRequiredMixin.

In terms of the UI itself though, well I'm not making use of this upload endpoint anywhere as I wrote that for the people asking for an API. If you've written a user-friendly interface that can sit on top of it, all the better. It could easily be included in any UI app as I mentioned above.

Any of this sounding manageable? If you like this, I can probably write the beginnings of the above relatively simply.

danielquinn commented 7 years ago

Update on this: I've just issued a PR (#197) with a substantial change that refactors the consumer into a pluggable architecture. As a result, there's a new app in there called paperless_tesseract that announces its ability to handle rasterised image files and thereby support two public functions: .get_thumbnail() and .get_text().

For the moment, there's basically no documentation because I wanted to get some community feedback before I merge it into master and document it as "the way we do things", but barring any substantial changes required, I'll likely merge this in about a week.

Have a look at the pull request yourself and let me know what you think. I'm happy to amend things if you have some ideas.

nebulade commented 6 years ago

Having a basic uploader view or even a drop target zone to just drop documents into the view would be a great thing to have. Unfortunately, I have never written any django code to add this myself and for some reason I have a hard time finding the relevant html portions to add such a button to show the form mentioned in https://paperless.readthedocs.io/en/latest/consumption.html#http-post

danielquinn commented 6 years ago

The documentation references a hypothetical form one might use to send an HTTP POST. Currently, the only methods Paperless uses to import documents are the API (used most prominently by paperless-desktop) and the consumption directory.

As the basic Paperless UI is dependent on the Django admin, writing a drag-n-drop form for document creation would involve creating an admin panel via a child of Django's ModelAdmin class. It's not too difficult, but it does require a modicum of understanding of how Django works.

Given the popularity of paperless-desktop, this isn't something I'm likely to prioritise, but should you, or anyone else with a need for this want to code a page for this, I'd merge it.

philippeowagner commented 6 years ago

❤️ paperless-desktop

nebulade commented 6 years ago

I gave paperless-desktop a quick try and it is indeed very nice, however seems to be quite of a burden for non mac users to get installed, especially for less technical people. Given that it is imply an electron app, maybe it is possible to somehow put paperless-desktop as a webapp alongside paperless, that would make it much more comprehensive and a lot more approachable in my opinion.

danielquinn commented 6 years ago

That's a neat idea @nebulade, but I have no idea how to do that. If you do though, I'd be happy to accept a PR. I figure this could be done with a single setup script or something.

nebulade commented 6 years ago

I don't have all too much knowledge about electron, nor react, however I gave it a try and some basic things are working. Login/correspondents/tags/document listing, there is still some bug with the document details view though.

In any case this does require changes in paperless-desktop to work around the electron dependencies. I have pushed a branch there which contains an injected electron shim. Besides native window actions, mostly the app itself only uses a simple store saving to the fs instead of possibly the browser local storage.

The branch is at https://github.com/nebulade/paperless-desktop/tree/no_electron Also to test this you probably need CORS support in paperless, that is simple to add with https://github.com/ottoyiu/django-cors-headers

Not sure though how to proceed, I guess it would be useful to get @thomasbrueggemann involved in any case, to also check if it even makes any sense to further spend time on it. I don't know what his plans are.

I do think though this would enhance paperless very much to get such a sophisticated UI built-in.

thomasbrueggemann commented 6 years ago

Thanks for giving paperless-desktop a try, @nebulade . I currently provide macOS-only binaries via electron, that is true. Mainly, because I only use macOS. We have an open issue to bring the app to Windows (and possibly Linux) https://github.com/thomasbrueggemann/paperless-desktop/issues/22 but nobody has stepped forward to work on it yet ;-)

It should be fairly easy to create Windows binaries or even an MSI Installer, to lower the burdon for Windows users. The only thing that's stopping me (or us) from doing so, is the fact that I used a macOS stylesheet to layout the app and have not yet found a reasonably good-looking Windows stylesheet to make the app look Windows-native as well. Perhaps the better approach would be to style the app more platform-agnostic, like e.g. Spotify is doing or Atom.io. But I kind of wanted the macOS look for me. It it was always only ment to be a macOS app from my point of view.

Back to the topic of providing paperless-desktop as a web-app alongside the paperless server: I deliberately programmed an electron app because I wanted the look and feel of a native app and potentially use native features, like more seamless file-uploads (still WIP). Therefore I would personally not engage in driving the development of a webapp-port forward. And I would urge you to not "fill" or "shim" the electron functions for the web, but rather strip them out completely or rewrite them for the web. But then again, I would not want a website that looks like a macOS app. That is somewhat weird.

If you'd ask me how I perceive the "paperless-ecosystem", I would see this paperless-repository here has the REST API server and paperless-desktop as one client idea. And you might have a paperless-webui project as another client orbiting around "paperless". If that makes sense ;-)

nebulade commented 6 years ago

Thanks for sharing your view, I thought this might be the case, given how the UI looks like. The initial idea for me was more on the lines of seeing if UI code duplication could be avoided, if a user targeted web ui is even wanted. Currently I totally agree with your view of paperless being a great API server, which happens to come with an admin interface.

I do not have any time in writing a webui targeted towards users like paperless-desktop from scratch, I did investigate here, as I am packaging up paperless for Cloudron and tried to find a good solution for the missing UI components.

Either way I did learn a few things on electron while looking how to make it somewhat work as a prototype to discuss further if even possible.