viash-io / viash

script + metadata = standalone component
https://viash.io
GNU General Public License v3.0
39 stars 2 forks source link

Revisit memory units. #555

Closed DriesSchaumont closed 5 months ago

DriesSchaumont commented 1 year ago

Origionally posted at: https://github.com/openpipelines-bio/openpipeline/issues/556#issue-1902292280

https://github.com/openpipelines-bio/openpipeline/blob/c4b39e32f9da10efb54ac3c85985b47f83f873f8/target/nextflow/annotate/popv/main.nf#L2597-L2601

Currently the memory requirements are multiplied by 1024 as in byte times, this has however changed. Requesting 500Gb of memory will now result in e.g. 520Gb being reserved. Based on the new standards the multiplication should be done by thousands.

Currently the standard is 1 Gigabyte (GB) = 1000 Megabytes (MB). B
ut it wasn't always like that. For a long time, 1 Kilobyte=1024 bytes, 1 Megabyte = 1024 kilobytes, 1 Gigabyte = 1024 megabytes, and so on. 
The reason being the fact that it easier to do binary math when working with powers of two.

As a reference, from the units policy from Ubuntu: https://wiki.ubuntu.com/UnitsPolicy

If you divide by 1000, you probably use the [SI prefix names](http://en.wikipedia.org/wiki/SI_prefixes#List_of_SI_prefixes), if you divide by 1024, you probably use the [IEC prefix names](http://en.wikipedia.org/wiki/Binary_prefix#Prefixes).

However as , ISO 80000-1:2022 notes:

SI prefixes refer strictly to powers of 10, and should not be used for powers of 2. For example, 1 kbit should
not be used to represent 1024 bits (2 10 bits), which is a kibibit (1 Kibit).
Screenshot 2023-10-07 at 09 14 17
rcannood commented 1 year ago

From wikipedia (source):

Screenshot from 2023-10-09 11-53-10

So we should support both TB (1000^4) and TiB (1024^4)?

DriesSchaumont commented 1 year ago

From wikipedia (source):

Screenshot from 2023-10-09 11-53-10

So we should support both TB (1000^4) and TiB (1024^4)?

That would be the most technically correct while also being practical? Only ignoring the fact that SI units are applied to base2 scale, but I think this is overlooked in most applications...

ddemaeyer commented 9 months ago

The problem is quite pressing in regards to scheduling. At the moment we use K8s to schedule where we are requesting 128Gb of memory on the instances. If then the component is scheduled he will request more than this amount meaning that the job is not scheduled since it requests more memory than is provided by the instance definition.

Grifs commented 5 months ago

As a reference for myself and others in the future, Nextflow uses 1024-base memory units: https://www.nextflow.io/docs/latest/script.html#memoryunit

After some searching, I found what docker does behind the screens too (it's not well documented): https://github.com/docker/go-units/blob/master/size.go#L86 tl;dr: also 1024-base units

Kubernetes supports both: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#meaning-of-memory

Grifs commented 5 months ago

Behaviour was changed in Viash 0.9. "MB" will now refer to SI units (1000-base) whereas "MiB" will refer to IEC units (1024-base). The default Nextflow labels are changed to reflect the 1000-baseness of the labels and 1024-baseness labels are added.