uga-libraries / hub-monitoring

Scripts for summarizing and validating content on the Digital Production Hub, the UGA Libraries' centralized storage for digital objects that are not suitable for our digital preservation system.
Creative Commons Attribution Share Alike 4.0 International
1 stars 0 forks source link

Accuracy of Size_GB #68

Closed amhanson9 closed 4 months ago

amhanson9 commented 4 months ago

The bigger the accession, the bigger the difference between the size in GB in the report compared to the properties of the folder. I looked at the 7 biggest Hargrett accessions, all over 0.5 GB.

Accession size is calculated in get_size() using os.stat().st_size. This is in bytes and is converted to GB by dividing by 1 million and then rounding to at least 2 decimal places, but as many as needed to not get 0.

Collection size is calculated in combine_collection_data(). It adds the accession sizes and rounds again using the same function as with accessions.

This could be coming from the byte to GB conversion, rounding causing inflation, and/or os.stat() getting a different result than Windows properties.

amhanson9 commented 4 months ago

@emkaser Is the size close enough to be of use as is or should we try to get it closer to what is in properties?

emkaser commented 4 months ago

@emkaser Is the size close enough to be of use as is or should we try to get it closer to what is in properties?

Right now, I don't foresee any issues with there being a minor size discrepancy in the report - I will keep an eye on it to see if it becomes a problem down the road but I think it should be ok as-is.