feat: expose interface for getting info at building

cutecutecat commented 1 year ago

Description

We introduce a global module env with a getter function os() in starlark API

This can exposed some variables for subsequent building procedure, especially for some library function developing, they can automatically detect os version or language, and be compatible with them without user input these information again. There could be more getters implementation in this module in the future.

Examples

def build():
    base(os="ubuntu18.04", language="python3.8")
    print(env.os())
    # output "ubuntu18.04"

def build():
    base(os="ubuntu20.04", language="python3.8")
    print(env.os())
    # output "ubuntu20.04"

PS. It would be better to be a lazy-evaluate variable env.os for design, but starlark doesn't support lazy-evaluate!

ref: https://github.com/tensorchord/envdlib/issues/14

Message from the maintainers:

Love this enhancement proposal? Give it a 👍. We prioritise the proposals with the most 👍.

cutecutecat commented 1 year ago

/assign

VoVAllen commented 1 year ago

We had some discussions here https://github.com/tensorchord/envd/issues/1132#issuecomment-1308433521 about the interface. I think it's better to make it as a key-value interface instead of methods.

kemingy commented 1 year ago

We had some discussions here #1132 (comment) about the interface. I think it's better to make it as a key-value interface instead of methods.

What's the advantage of key-value interface? I think this info is not static, thus a method would be suitable.

VoVAllen commented 1 year ago

@kemingy So that every function can provide such information. For example envdlib can provide torch as function. Contributor can also provide torch version in the function. Other packages relying on torch can get such information.

kemingy commented 1 year ago

@kemingy So that every function can provide such information. For example envdlib can provide torch as function. Contributor can also provide torch version in the function. Other packages relying on torch can get such information.

Do you mean that it's also mutable and can be modified with assignment statements?

VoVAllen commented 1 year ago

Yes. Something like env.get("os/version", "20.04"), env.get("os/name", "ubuntu") or env["package/torch"]="1.12.0"

env.get(key, default_value). The same syntax as python dict

cutecutecat commented 1 year ago

I like this way, it could act as a hack of lazy-evaluate, if the os is a value, it would be more pythonic.

def get_os():
    info_dict = get_build_info()
    return info_dict["os"]

As it actually convey a copy/value of graph, not reference of graph, people may mistakenly think they can write the graph.

Apart from that, we couldn't provide the exact graph, but only parts of it and flatten one, as starlark don't support class/struct.

Rather than named get_build_graph, get_build_status/get_build_data/get_build_info might be better.

As it is a dict of fixed keys, We can use TypedDict for Python type hint:

from typing import TypedDict

class Movie(TypedDict):
    name: str
    year: int

movie: Movie = {'name': 'Blade Runner',
                'year': 1982}

ref: https://peps.python.org/pep-0589/

VoVAllen commented 1 year ago

@cutecutecat Actually I think it's fine for the users to write into the key-value graph. For example, some library will depends on tensorrt version, and tensorrt is only provided in envdlib. User may want to check the tensorrt version when installing other libraries

cutecutecat commented 1 year ago

@cutecutecat Actually I think it's fine for the users to write into the key-value graph. For example, some library will depends on tensorrt version, and tensorrt is only provided in envdlib. User may want to check the tensorrt version when installing other libraries

@VoVAllen @kemingy

If we support write to build graph and let it build, than user or envdlib developer are able to overwrite the key set in build procedure, which destroys envd encapsulation, like:

def build():
    base(os="ubuntu18.04", language="python3.8")
    info_dict = get_build_graph()
    info_dict["os"] = "centos6.5"
    info_dict["language"] = "R"

If these setters occurs in different extensions, it is difficult to debug where the change happens.

I think we should only support read to build graph, as it's primary element, could be revised by envd api only.

Obviously, for envdlib developer to set some persistent variables, we could introduce another API, like environment variable, it supports both read and write, and may work like this

env.set('tensorchord.envdlib.tensorrt.version', "1.0")
version = env.get('tensorchord.envdlib.tensorrt.version')

and the build graph API likes this:

os = env.graph.os()

cutecutecat commented 1 year ago

There might need more discussion before proceeding this.

/unassign

VoVAllen commented 1 year ago

Some thoughts:

Override can be a feature instead of a bug. Just like dynamic language as Python, you can monkey patch any methods on your own. For now, I don't know whether we need to let users do this.
To solve the overwrite problem, we can simply raise errors when the program writes to an existing key. Thus env can be a special dictionary, with some internal constraint. We can also add protection by prefixes such as envd/os/version, key starts with envd/ is protected. The only positive point I can see about env.os() over env['os'] is that method seems guaranteed to be accessible to the users, but the key might be missing in the dictionary. However, I still think the flexibility provided key-value like behavior is needed in this proposal.

tensorchord / envd

feat: expose interface for getting info at building #1267

Description

Examples