Open z5tron opened 6 years ago
I'm not sure what is meant by different zotero_storage meant (different profiles? different libraries? different folders?), but the logic right now is that a file is counted as a duplicate if there are two or more attachments of the same type (doc, docx, pdf, whatnot) under the same reference item.
I meant the physical folder named "zotero/storage/". But you have explained my questions. Still there is problem: I have a book item with "Google Books Link" (URL link), a epub and mobi, three attachments under this book in total. Each with different file type. It is marked as "#duplicate_attachments".
I'd have to look at a copy of your database to tell why that happens, I don't have an immediate explanation.
So, if one item has 2 or more attachments with the same file type, they will be treat as duplicates?
Yes.
Just a comment to think about: When a reference has supplementary material, I often end up with multiple PDF attachments for one reference ... would it be possible to handle this case with file size rather than file type? (This is not frequent enough to be a big deal, for me at least ... but I'm just throwing it out there in case it matters for others).
That wouldn't really help for the cases I made this for. I often had merged duplicates where I acquired substantially similar, but not bit-for-bit equal, versions of the same article.
To me, "#duplicate_attachments" suggests that the flagged items would contain the same attachment multiple times (in particular, my expectation given this wording was that the files would be identical, or at least have identical hashes under something like md5 or stronger). Would it be feasible to rename this tag to something more explanatory / less prone to misunderstanding, like "#multiple_attachments_of_same_type"?
Yes, this tag says duplicate_attachments
but this is false, they are just attachments of the same type. duplicate_attachments
would mean they are byte-for-byte identical (which many are from merging items).
Feel free to submit a PR. Personally I'd consider it a duplicate if the article text is substantially the same.
I have lots of items are labelled as "#duplicate_attachments", but they are under different zoetro_storage folder, and with different size, different name(title inside the middle panel), different physical file name and modification time.
Is this a feature or bug ?
Thanks.