opendatahub-io / notebooks

Notebook images for ODH
Apache License 2.0
17 stars 57 forks source link

Missing libraries or OS packages to consider for next release #72

Closed guimou closed 10 months ago

guimou commented 1 year ago

This issue is meant to catch the different libraries or OS packages that people think are missing and should be added to the standard images. Please add your item(s) directly in comments, with a small justification/advocacy of why it should be added.

guimou commented 1 year ago

libodbc When using pyodbc to connect to a database, libodbc.so.2 is not found on the system. As pyodbc is a pretty standard library, it should work without having to create a custom image to include it.

guimou commented 1 year ago

libsnd This library is needed to read/write sound files. As data analysis, model training,... can also be done on such sources (using torchaudio for example), this library should be included in the base image.

guimou commented 1 year ago

CUDA Toolkit As it is now, the CUDA images only include the base CUDA, but not the toolkit. Many libraries or user code require it to compile specific runtimes. The toolkit should be included by default in all CUDA images.

guimou commented 1 year ago

git-lfs With the advent of Huggingface and other model repos, some files are too big to be stored on a standard Git repo. Git-LFS provides a mechanism to store references in Git, but store the big files themselves somewhere else. It's currently missing from the base images, forcing people to either download files elsewhere temporarily then upload to the notebook environment, or do a PATH trick with the git-lfs binary. This one should be included in all base images.

guimou commented 1 year ago

drivers for current databases Examples: mongodb, postgres (13,14,15), mssql server (2019, 2022)

atheo89 commented 1 year ago

The OOTB Notebook Image Releases n&n-1 spreadsheet has recently received an update. You can find the most current Python versions as well as the new Python package inclusions for each notebook under the "Future Release (N+1) v2023b" column. As we begin the process of updating the notebooks, please make sure to reference and utilize these updated versions.

dimakis commented 1 year ago

The CodeFlare SDK

This is used to interact with the Distributed Workloads stack.

guimou commented 1 year ago

@dimakis Would the CLI be useful too, or just wasted space?

harshad16 commented 11 months ago

Completed with #205 thanks for the work.

cuda-toolkit cant be included in the UBI/RHEL images. Wait for this action.

harshad16 commented 10 months ago

Closing this as complete. For any of the missing one, a new issue can be opened. Thanks for working on this.