rapila / cms-base

The rapila cms’ internals. Please file all issues here if they’re not directly related to a plugin or the sample site.
http://www.rapi.la
3 stars 1 forks source link

Document deduplication #180

Closed sabberworm closed 10 years ago

sabberworm commented 10 years ago

I think we can merge this. Uploading, downloading and deletion work as they should.

The document export script seems to have problems with duplicate documents and I’m not sure if we need to take a closer look but I don’t think there’s anything fundamentally wrong with the implementation.

sabberworm commented 10 years ago

@juergmessmer Can you try this on some of your databases and tell me what you think? We did test it with Altenburg and it did bring the number of BLOBs down from 2873 to 1712.

juergmessmer commented 10 years ago

I can't migrate the model. A exception is thrown. (I did checkout the branch, update all base recursively, ran propel generator and cleared cache)

[phingcall] Unable to find adapter for datasource [rapila]. Execution of target "migration" failed for the following reason: /Users/jmg/Sites/cms_mosaics/base/lib/vendor/propel/generator/build-propel.xml:424:26: Execution of the target buildfile failed. Aborting. [phing] /Users/jmg/Sites/cms_mosaics/base/lib/vendor/propel/generator/build-propel.xml:424:26: Execution of the target buildfile failed. Aborting.

sabberworm commented 10 years ago

That’s a bug in propel 1.7.1. You need to manually update Propel.php as follows:

https://github.com/jh-ism-online-webmaster/Propel/commit/846640a8d00a01c462c3dc78969afa2931e75b54

Also, don’t forget to generate the model before doing the migration!

On 24.05.2014, at 18:25, Jürg Messmer notifications@github.com wrote:

I can't migrate the model. A exception is thrown. (I did checkout the branch, update all base recursively, ran propel generator and cleared cache)

[phingcall] Unable to find adapter for datasource [rapila]. Execution of target "migration" failed for the following reason: /Users/jmg/Sites/cms_mosaics/base/lib/vendor/propel/generator/build-propel.xml:424:26: Execution of the target buildfile failed. Aborting. [phing] /Users/jmg/Sites/cms_mosaics/base/lib/vendor/propel/generator/build-propel.xml:424:26: Execution of the target buildfile failed. Aborting.

— Reply to this email directly or view it on GitHub.

juergmessmer commented 10 years ago

Thanks, that worked.

juergmessmer commented 10 years ago

How about these use cases:

sabberworm commented 10 years ago

In both cases, the answer is yes, there will be a new document_data object as the primary key of a document_data object is its hash (which is unique to the blob being stored). The new model does not in any way change the way these cases are handled. If your question is whether they should, the answer is: we can offer it as an option and display a dialog offering to update all other documents with the same hash to the new version.

sabberworm commented 10 years ago

P.S.: I think I’ve just found a bug in the implementation. If I’m correct, updating a document with new data will not delete the old document_data object even if it isn’t referenced anywhere else.

juergmessmer commented 10 years ago

Yes, it would be good to offer this option to update a document. At least for an administrator.

juergmessmer commented 10 years ago

more use cases that point to problematic issues: • if I replace the file (different) in the document_detail then the document hash is updated and a new document_data created, the old document_data continues to exist, even if there is no other document related to it. It's a loose end and needs to be cleaned up. • upload document with same content and same name from admin or ckeditor creates always a new identical document (except user and date) with the same name and the same hash. The document_data is handled correctly. Question: is it recommendable to store twice exactly the same file with identical name in the same category? Should these be merged? It probably only happens when users upload a file from CKeditor not knowing that it already exists. In this case merging it and returning existing id should be discussed. • I guess other tests are recommendable, mainly in environments with externally managed documents

sabberworm commented 10 years ago

On 27.05.2014, at 14:07, Jürg Messmer notifications@github.com wrote:

more use cases that point to problematic issues: • if I replace the file (different) in the document_detail then the document hash is updated and a new document_data created, the old document_data continues to exist, even if there is no other document related to it. It's a loose end and needs to be cleaned up.

This is exactly what I meant with my P.S. post. I’m on this already.

• upload document with same content and same name from admin or ckeditor creates always a new identical document (except user and date) with the same name and the same hash. The document_data is handled correctly.

We do not change existing behaviour with this patch, this patch is only about storing. Sure, the new functionality offers new possibilities but exploring these should be a new feature we can discuss at some time. If you think we should implement something new, please create a new issue.

Question: is it recommendable to store twice exactly the same file with identical name in the same category? Should these be merged? It probably only happens when users upload a file from CKeditor not knowing that it already exists. In this case merging it and returning existing id should be discussed.

I don’t understand the question. I think not. I think here, too, you’re exploring new possibilities. That should not be part of this patch, see above.

• I guess other tests are recommendable, mainly in environments with externally managed documents