Closed Mirodin closed 3 years ago
To reproduce:
Hello!
That's a feature that has been in paperless for a long time and I did not touch it at all (neither code nor documentation)
I've checked the logic of the code, and the documentation is in fact not describing the actual behavior. The consumer first checks for the format created - title - tags
, which matched the second filename. However, this rule does not accept tags with spaces. If that rule does not match, it will parse the filename as created - correspondent - title
. This is what happens for the first file.
I guess its useful for initially importing lots of documents, but apart from that, I don't think many people use this feature. I'm considering to remove most of the logic, see https://github.com/jonaswinkler/paperless-ng/discussions/83 as well. This is particular annoying when someone decides to put -
(with spaces around) in a filename, paperless will then split up the title and use part of it for the correspondent. Happened to me a couple times.
I'm not entirely sure what to do with this feature.
Reading #83 I would second getting rid of correspondent/tag guessing and just stay with date parsing. However putting a "Z" at the end is kinda weird (even though gscan2pdf does that for me). So maybe have this support some templating? My files usually are named like YYYY-MM-DD title.pdf
Alright, I'll put that on the agenda.
Edit. The Z usually denotes Zulu time (UTC), however I'm not entirely sure why that's required here when paperless just needs to parse dates, not times.
If you've got something to add, please do so in the related issue.
Following recommendations from the documentation https://paperless-ng.readthedocs.io/en/latest/advanced_usage.html#guesswork I get mixed results with my documents.
I stumbled upon this when migrating my ~1800 docs for approx. 5 - 10% of my files. For me this does not look like consistent behaviour. Maybe I am missing something but that is how I understand the documentation about this.
I am running the latest docker compose file pulled from github.