instructlab / sdg

Python library for Synthetic Data Generation
Apache License 2.0
6 stars 15 forks source link

Reserve package name on PyPI #2

Closed russellb closed 2 weeks ago

russellb commented 1 month ago

See title. Register the library on PyPI. We have to publish a package, even if it's empty at first.

Another option from @tiran

here is another option. We could turn instructlab into a namespace package. That would allow us to distribute instructlab.sdg subpackage as a separate Python package that can be installed with pip install instructlab.sdg.

https://packaging.python.org/en/latest/guides/packaging-namespace-packages/

russellb commented 1 month ago

@tiran I'm interested in your feedback on the package name. There's more context here: https://github.com/instructlab/dev-docs/blob/main/docs/sdg-repo.md

tiran commented 1 month ago

@russellb There is another option. We could turn instructlab into a namespace package. That would allow us to distribute instructlab.sdg subpackage as a separate Python package that can be installed with pip install instructlab.sdg.

https://packaging.python.org/en/latest/guides/packaging-namespace-packages/

russellb commented 1 month ago

@russellb There is another option. We could turn instructlab into a namespace package. That would allow us to distribute instructlab.sdg subpackage as a separate Python package that can be installed with pip install instructlab.sdg.

https://packaging.python.org/en/latest/guides/packaging-namespace-packages/

Thanks! I added that to the list of options at the top.

I don't really have a strong opinion. I just want to do something following good Python practices.

dhellmann commented 1 month ago

I like the namespace package idea, if the things are coupled in a way that you'd only use them together. If you would use instructlab.sdg without instructlab then I'm not sure I'd go that route.

tiran commented 1 month ago

Either option is fine.

Namespace packages require a tiny boiler plate and some coordination between packages. They are a great fit for an organization that controls a top level namespace. The concept is similar to a top level domain in DNS. The concept has been around for 15, 20 years and is used by e.g. Zope and Plone. IMHO it would be a better option for us.

russellb commented 1 month ago

I didn’t mean to close this.

I think the namespace package idea sounds good. I think we need to change the current instructlab package first?

dhellmann commented 1 month ago

I didn’t mean to close this.

I think the namespace package idea sounds good. I think we need to change the current instructlab package first?

Yes, you'll need to ensure that everything is delivered in a subdirectory of the instructlab directory.

I just remembered that we ran into some technical issues with namespace packages in OpenStack. We wrote up some of the details here and you may want to review that to see whether it's likely to be an issue for InstructLab. I think you'll be able to work around the "installing editable" issue, if it hasn't been fixed. I don't think the shared site-packages issue is as likely to come up, that was a bad practice we couldn't eliminate for OpenStack.

tiran commented 1 month ago

Namespace packages in Python 2 required some nasty hacks and workarounds. Python 3 has native support for namespace packages.

__path__ = __import__("pkgutil").extend_path(__path__, __name__) in src/instructlab/__init__.py is sufficient. Eventually we want to move all Python files in src/instructlab to a subpackage, too. Then we can drop the __init__,py completely and use native PEP 420 namespace packages.

nathan-weinberg commented 4 weeks ago

I like the idea of a namespace package - would we still keep this repo separate if we go that route?

russellb commented 4 weeks ago

I like the idea of a namespace package - would we still keep this repo separate if we go that route?

I think reasons for wanting it separate mostly still apply (wanting a separate team that can evolve SDG beyond the immediate needs of the current ilab workflow).

russellb commented 4 weeks ago

Next steps if i understand correctly:

Is that right?

dhellmann commented 4 weeks ago

Next steps if i understand correctly:

Is that right?

We don't have to support python2, so I think just moving the code in both repos into submodules (subdirectories of instructlab) and removing the instructlab/__init__.py is what you want. https://peps.python.org/pep-0420/#specification explains how the import machinery handles that case.

Then, yes, renaming the dist in this repo to instructlab.sdg would make sense.

tiran commented 4 weeks ago

IIRC pylint does not handle PEP 420 namespace packages well. That's the reason I recommended to keep the __init__.py with the pkgutil line.

russellb commented 2 weeks ago

this is done! we are publishing to instructlab_sdg