tensorchord / envd

🏕️ Reproducible development environment
https://envd.tensorchord.ai/
Apache License 2.0
2k stars 156 forks source link

feat: apt_source interface refactor to support adding source signature and priorities #1298

Open oubotong opened 1 year ago

oubotong commented 1 year ago

Current config.apt_source interface support users to add third-party package source. However, the implementation of this interface will take a whole string as input and override the original /etc/apt/source.list file.

apt_source(source='''
    deb https://mirror.sjtu.edu.cn/ubuntu focal main restricted
    deb https://mirror.sjtu.edu.cn/ubuntu focal-updates main restricted
    deb https://mirror.sjtu.edu.cn/ubuntu focal universe
    deb https://mirror.sjtu.edu.cn/ubuntu focal-updates universe
    deb https://mirror.sjtu.edu.cn/ubuntu focal multiverse
    deb https://mirror.sjtu.edu.cn/ubuntu focal-updates multiverse
    deb https://mirror.sjtu.edu.cn/ubuntu focal-backports main restricted universe multiverse
    deb http://archive.canonical.com/ubuntu focal partner
    deb https://mirror.sjtu.edu.cn/ubuntu focal-security main restricted universe multiverse
''')

This implementation has several problems:

  1. It's not recommended to override the source.list file all the time. Additional package sources should be placed under the directory /etc/apt/source.list.d/ in a new file (E.g. package.list)
  2. Sometimes, multiple sources may contain the same package. If one source has higher version package, apt will automatically download it. However, if two sources have the same package version, there should be an interface for the user to specify the priority of those sources. This could be achieved by put configuration under /etc/apt/preferences.d/
Package: *
Pin: origin ftp.ch.debian.org
Pin-Priority: 700 #The higher the number, the higher the priority is. -1 represents ignoring this source
  1. Sometimes, the signatures of some package repositories are expired, to avoid potential security risks. It is recommended to add the signature to the third-party source. For example, in the file /etc/apt/source.list.d/example.list add:
    deb [signed-by=signature.asc] https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/

    If user wants to use the source without signature, it can be achieved by adding:

    deb [trusted=yes] https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/

    No matter which option users choose, we shall provide such an option in the interface.

Why is this important?

If you want to use the latest version of R environment. From their official website: CRAN-R, it is recommended to add their official repository into apt sources. However, their original signature is expired, so you have to download the latest public key from their website and add to the apt source. Our current interface doesn't support this, so we need to refactor it.

It also gives users more space and flexibility to specify customized repositories.

Proposed Initial Design


Here is the initial proposal of how the config.apt_sourceinterface should look like:

  1. To make it consistent with other interfaces such as install.apt_package: def apt_packages(name: List[str]). Instead of using a whole string to store each source, we should also make it into a list.
  2. A signature option should be provided for user to specify the signature of the repositories. For example
    config.apt_source([signature=url1, source=src1], [signature=url2, source=src2]...)

    The signature here is where we should download the GPG key. When there is no signature provided, the source would be trusted=yes by default

  3. There should also be a field for user to specify the repository priority. For example:
    config.apt_source([signature=url1, source=src1, priority=700], [signature=url2, source=src2, priority=200]...)

    Again, this is just an initial proposed design for this interface, if you have any concerns or suggestions, feel free to comment in the discussion.


Message from the maintainers:

Love this enhancement proposal? Give it a 👍. We prioritise the proposals with the most 👍.

kemingy commented 1 year ago

Refer to man 5 sources.list:

BTW, I think the SJTU example can be put in one line since the format is:

deb [ option1=value1 option2=value2 ] uri suite [component1] [component2] [...]
deb-src [ option1=value1 option2=value2 ] uri suite [component1] [component2] [...]

They are just different components. Refer to Debian example.

In this way, the interface can be:

config.apt_source(uri, suite, components, signature)

I'm not familiar with the mirror priority part.

gaocegege commented 1 year ago

The user interface looks complex, could we keep the older interface and see if we can have an advanced function or setting for it?

BTW, I think this issue is in low priority.

kemingy commented 1 year ago

BTW, I think this issue is in low priority.

It's required because v1 doesn't support R or Julia. R is not well maintained for containerization. Thus it requires some extra steps to install. Haven't checked Julia. But I think it's a general requirement.

oubotong commented 1 year ago

The user interface looks complex, could we keep the older interface and see if we can have an advanced function or setting for it?

Right, the interface is definitely complex since it gives extra options. Using the current interface to install R environment, the user needs to manually add the source like this:

apt_source(source='''
    deb [trusted=yes] https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/
''')

This would temporarily allow users to install the latest R environment. However, this is not guaranteed to install the latest version of R from the official repo.

gaocegege commented 1 year ago

/cc @VoVAllen

VoVAllen commented 1 year ago

Does changing the config.apt_source behavior to appending source.list work?

Seems this issue includes several proposals:

It's not recommended to override the source.list file all the time. Additional package sources should be placed under the directory /etc/apt/source.list.d/ in a new file (E.g. package.list)

So we should put sources in different files right?

Sometimes, the signatures of some package repositories are expired, to avoid potential security risks. It is recommended to add the signature to the third-party source. For example, in the file /etc/apt/source.list.d/example.list add:

This is needed for the R environment I think. So what's the minimal change we need to support R?

oubotong commented 1 year ago

Does changing the config.apt_source behavior to appending source.list work?

Directly appending to source.list works only when the package only exists in the newly added repo. If there is multiple sources containing exact the same version package, the apt will download it from the first source appears in the source.list. The search happens in lexicographic order. In our case, the default version of R-base provided by the ubuntu repo is 3.6.3 which is too old. Using the official repo can install the 4.2.2 version of R. So the apt will automatically choose the official repo once we add it.


So we should put sources in different files right?

It is recommended to add third-party repo under the /etc/apt/source.list.d directory in a new .list file for maintainability. The central source.list usually contains the basic free distribution repo. However, there is no restriction on when to use the source.list.d. You can always just put all the repo in the same file.


This is needed for the R environment I think. So what's the minimal change we need to support R?

Yes, this is needed for R environment. To support R, we need to download the R-base package from the official repo and it will settle down the environment. If we do not want to include the latest signature on the package, we can just add the repo like this:

deb [trusted=yes] https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/
kemingy commented 1 year ago