kitodo / kitodo-production

Kitodo.Production is a workflow management tool for mass digitization and is part of the Kitodo Digital Library Suite.
http://www.kitodo.org/software/kitodoproduction/
GNU General Public License v3.0
62 stars 63 forks source link

Hierarchy migration: creation of image folders for toplevel-processes #4269

Open andre-hohmann opened 3 years ago

andre-hohmann commented 3 years ago

Problem

After the the migration of serial publications (hierarchy migration), image folders are created for the toplevel-processes, as for example: periodical, multivolume, ...

It seems as if the folders are created also for toplevel-processes, which are created directly in Kitodo.Production.

Question

The toplevel-processes should not contain images. Is it necessary to create the image folders? Could it cause any problems?

See also: #4268, #4267, #4270, #4271

matthias-ronge commented 3 years ago

I don't think the empty folders can cause a problem.

henning-gerhardt commented 3 years ago

I don't think the empty folders can cause a problem.

It is not a problem but a resource issue. A few empty directories per process did maybe no count in a high value but they consuming resources in the underlaying file system (f.e. ext3/4 i-node entries) and if you have a backup system even entries in the backup system as they must monitored and included into the backup and restore process. But if you have thousand or ten-thousand hierarchy processes with many more empty directories than you must calculate with a different amount of "wasted" resources in many places The other "thing" is, that this empty directories can maybe cause irritations on a case of disaster recovery or other discovering of error scenarios. You will look many times if your disaster recovery was wrong or this directories was empty before you have this scenario.

matthias-ronge commented 3 years ago

Perhaps you can modify your backup so that only existing files go into it to take up space, and empty folders are not backed up. I don't know anything about Ext-¾ I in particular. The computation of west resources should however be possible, if it comes to that.

Nor do I want there to be a case of irritation because of the empty directories that recover from other error scenario discoveries. Yes, I can often look at my disaster recovery, which is wrong in these empty directories, and before that, I also have this scenario. Maybe @Kathrin-Huber should decide that.

henning-gerhardt commented 3 years ago

Perhaps you can modify your backup so that only existing files go into it to take up space, and empty folders are not backed up.

No, that is not possible as the consequences of this change are dramatic.

I don't know anything about Ext-¾ I in particular. The computation of west resources should however be possible, if it comes to that.

The amount if wasted resources is hard to determinate before a migration as I don't know how many new processes are created on migration nor did I know how many processes are created in the future. I did get only the information that the amount of i-nodes is running out and I must raise this amount. But than it could be to late to react. So I try to not to waste any resources.

It should not so hard to decide if creation of this directories is really necessary if you create or migrate a hierarchy process.

matthias-ronge commented 3 years ago

I can't say that either. But I would estimate an empty directory to be roughly 16 bytes. If you have eight empty directories per parent process, for 200,000 processes, that would be roughly 2.4 MB. Yes, it's unnecessary, but it's not the biggest problem.

henning-gerhardt commented 3 years ago

I can't say that either. But I would estimate an empty directory to be roughly 16 bytes. If you have eight empty directories per parent process, for 200,000 processes, that would be roughly 2.4 MB. Yes, it's unnecessary, but it's not the biggest problem.

Like already mentioned in the other issue: size and how to store information about directories, files is strong depended on the used file system. Maybe the 16 byte are correct for your used filesystem but maybe totally wrong for others.

For me empty and unused directories are not necessary, a waste of resources in different places and I will stop to discuss this with you. You did not agree with my opinion, this is fine as I even not agree with your opinion.

andre-hohmann commented 3 years ago

It seems that a consensus is not possible. From the users perspective it works - therefore i will change the label from "question" to "documentation".

matthias-ronge commented 3 years ago

I think it is both: For the question, the answer is that the folders are technically not causing any problems at the moment. Nevertheless, it would be an improvement if no folders were created for processes without a workflow, because they are not used.