red-data-tools / red-datasets

A RubyGem that provides common datasets
MIT License
30 stars 25 forks source link

Postponing to load individual datasets #158

Closed mrkn closed 1 year ago

mrkn commented 2 years ago

I'm concerned that many datasets are loaded in advance regardless of their necessities. https://github.com/red-data-tools/red-datasets/blob/master/lib/datasets.rb#L3-L34

How do you think about postponing to load individual datasets?

kou commented 2 years ago

Are you thinking about start-up time?

I agree with less start-up time is better. But I don't want users to force requiring each dataset explicitly such as require "datasets/iris". We can use autoload for this case but I don't like autoload a bit... For example, autoload doesn't work in Ractor.

Do you have any idea to implement this proposal?

mrkn commented 2 years ago

I don't know how to overcome the problem due to non-main Ractors. I guess we need a mechanism that non-main Ractors let the main Ractor load libraries.

kou commented 1 year ago

I've implemented this.

We can postpone to load by datasets/lazy:

require "datasets/lazy"
# Datasets::Iris isn't loaded yet
Datasets::Iris # Datasts::Iris is loaded now