peng-lab / BaSiCPy

MIT License
63 stars 21 forks source link

Remove import of datasets module from top level #153

Closed jmuhlich closed 2 months ago

jmuhlich commented 3 months ago

Placing an import for the datasets module in the top level __init__.py file causes the sample datasets to be downloaded for every user of the code. Most users don't need the sample datasets, and in fact it can cause lots of extra overhead and other errors in containers (as the cached downloads are discarded after every run and the calling user may not even be able to write to the default cache directory). Removing this import line eliminates the problem and has no other effects on the functioning of the code. The test cases still correctly trigger the downloads when they import basicpy.datasets. This change also cleans up __all__. 'datasets' no longer belongs there, and there was another symbol 'metrics' which is not even present.

yfukai commented 3 months ago

Hi thanks @jmuhlich for this suggestion. If I understand correctly, the datasets are downloaded only when the fetch function is called, but not at the time of the module import. In my environment,

%time import basicpy

and

%time from basicpy import BaSiC

displayed almost the same Wall time. Please let me know if I'm misunderstanding something! Cc: @Nicholas-Schaub

jmuhlich commented 2 months ago

My apologies! I now realize the problem actually lies with scikit-image 0.20.0. In that specific version, skimage downloads all of its test data to a cache directory upon import of the main package, rather than upon import of the data submodule. This has been fixed in 0.21.0 and newer. I guess I didn't read my stack trace carefully enough to notice that the direct caller of pooch in the problematic code path was skimage, not basicpy. Importing datasets in basicpy/__init__.py is indeed harmless, but I don't think it's actually achieving anything either.

yfukai commented 2 months ago

Thanks for noticing this problem. Let me just keep this top level import for the backward compatibility!