jgieseler / solo-epd-loader

Data loader (and downloader) for Solar Orbiter/EPD energetic charged particle sensors EPT, HET, and STEP. Supports level 2 and low latency data provided by ESA's Solar Orbiter Archive.
BSD 3-Clause "New" or "Revised" License
15 stars 2 forks source link

Environment variable for path #15

Closed tlml closed 2 years ago

tlml commented 2 years ago

Would it be possible to use (optionally) an environment variable for the path (preferably the same for all loaders)? That would make it much easier for multi-user environments to have data in one location only. Granted, it would possibly also need some file permission changing as well...

jgieseler commented 2 years ago

For most of the loaders (that is, besides solo-epd-loader), sunpy fido is used. I think it more or less already supports something similar. From the definition of get_and_create_download_dir(), I would say it looks for the env variable 'SUNPY_DOWNLOADDIR', and only if that is not defined, it uses some user folder like this:

>>> import sunpy 
>>> sunpy.config.get('downloads', 'download_dir')
'/home/gieseler/sunpy/data'

Thus, I guess you could set the 'SUNPY_DOWNLOADDIR' variable. Then, if you don't provide a path, that common location would be used. Because for those instruments that are not supported by CDAWeb (I think STEREO/SEPT and SOHO/EPHIN electrons) the sunpy downloads folder variable is obtained and used as default setting, it might work for them as well. (Should be tested!)

The only thing missing would then be solo-epd-loader. But as indicated by https://github.com/jgieseler/solo-epd-loader/issues/12, I anyhow planned to migrate it to sunpy-soar. If the env variable worked as described, that would be another argument for switching.

tlml commented 2 years ago

That sounds good!

I just tested the STEREO SEPT, and it goes to cwd+'data', but LET goes to the sunpy folder. From the source, it looks that the sunpy.config.get('downloads', 'download_dir') is done only for other data types, at row 395, but the None goes to soho_ephin_loader (row 368) in which the getcwd is used. Ditto for SOHO/EPHIN.

The SUNPY_DOWNLOADDIR needs the get_and_create_download_dir() it seems... Curiously, looking at its source, it looks like if the environment variable is used, the dir is not created, nor exception raised! A bug in sunpy?

jgieseler commented 2 years ago

Yes, you're right about STEREO/SEPT and SOHO/EPHIN. But where I actually did it was for Wind/3DP, cf. https://github.com/jgieseler/wind-3dp-loader/commit/ca4d2e7e9cf43a20802f6f1f4d6031bbd0f148f4. For some odd reason the fido downloads for Wind/3DP from CDAWeb were failing quite often, so I went for manually downloading the CDF files directly from https://sprg.ssl.berkeley.edu/wind3dp/data/wi/3dp. So there I use the SunPy download_dir if path is None. I guess I should adjust STEREO/SEPT and SOHO/EPHIN (electrons) accordingly.

And by the way, the usage of sunpy.config.get('downloads', 'download_dir') in line 395 you mention is also a slightly "bad" functionality I wrote. 🙈 The code now gets from fido a list of files that are online available, then checks if they can be found locally at path, and only if this is not the case, it initiates a fido.fetch. I tried to circumvent some problems with it while downloading lots of data files...

jgieseler commented 2 years ago

Yes, you're right about STEREO/SEPT and SOHO/EPHIN. But where I actually did it was for Wind/3DP, cf. jgieseler/wind-3dp-loader@ca4d2e7. For some odd reason the fido downloads for Wind/3DP from CDAWeb were failing quite often, so I went for manually downloading the CDF files directly from https://sprg.ssl.berkeley.edu/wind3dp/data/wi/3dp. So there I use the SunPy download_dir if path is None. I guess I should adjust STEREO/SEPT and SOHO/EPHIN (electrons) accordingly.

Ok, I just made that changes (https://github.com/jgieseler/stereo-loader/commit/73184d953b8beb903d3cf083201148fc85bdc325 & https://github.com/jgieseler/soho-loader/commit/31c0a86f9cf87613c4c765393dc5228c8a46b6f0)

tlml commented 2 years ago

Works now, thanks :-)

jgieseler commented 2 years ago

Yes, but not with 'SUNPY_DOWNLOADDIR' 😬

I think all occurrences of sunpy.config.get('downloads', 'download_dir') need to be replaced with sunpy.util.config.get_and_create_download_dir(). Otherwise it will always use the user directory; only get_and_create_download_dir() is looking for 'SUNPY_DOWNLOADDIR'. Or am I mistaken here? 🤔 So for your multi-user environments this still needs to be changed.

tlml commented 2 years ago

Ah, sorry, I thought it worked (I did try with the SUNPY_DOWNLOADDIR but didn't realise that it actually didn't work: I was a bit confused about why it would have worked...).

Yep: as far as I could see, it should work with the sunpy.util.config.get_and_create_download_dir(), it does return the content of SUNPY_DOWNLOADDIR if it is set, otherwise the "download_dir".

jgieseler commented 2 years ago

Ok, I think I now got it. Luckily, 'SUNPY_DOWNLOADDIR' is not needed (or used any more?). Following https://docs.sunpy.org/en/stable/guide/customization.html#dynamic-settings, all you have to do is run the following:

import sunpy
sunpy.config.set('downloads', 'download_dir', '/home/some_path/Downloads')

where the last argument is the new absolute path for the download directory. After running this, all loaders using Fido (and in addition STEREO/SEPT and SOHO/EPHIN electrons - that is, everything except solo-epd-loader at the moment) should use the new path in this session. To change it permanently, one has to change the download_dir entry in the sunpyrc file as described on the same page. Make sure to follow those instructions because otherwise with the next sunpy update your changes will be overwritten.

I shall add a link to https://docs.sunpy.org/en/stable/guide/customization.html to the README files so that users find this info more easily.

tlml commented 2 years ago

Hmm, having the SUNPY_DOWNLOADDATADIR would be handy though, particularly for new users, it could be set in the system-wide initialisation scripts... Would it work to have

filelist = [sunpy.util.config.get_and_create_download_dir() + os.sep + file for file in filelist]

instead of

filelist = [sunpy.config.get('downloads', 'download_dir') + os.sep + file for file in filelist]

The sunpy.util.config.get_and_create_download_dir() calls the sunpy.config.get('downloads', 'download_dir') if SUNPY_DOWNLOADDATADIR isn't set...

jgieseler commented 2 years ago

I would guess that should work. The thing is that the approach using sunpy.config.set('downloads', 'download_dir', '/home/some_path/Downloads') is described in the current docs, while SUNPY_DOWNLOADDATADIR is mentioned nowhere, and I just found it in the source code. So I don't know what the story behind it is (intended for some future use? only included for some backward compatibility?). Thus, before I suggest using it, one would need to check with the sunpy team what the correct way is.

tlml commented 2 years ago

Good instinct on the SUNPY_DOWNLOADDATADIR :-) . It seems I volunteered myself for some sunpy coding... It looks like this is closed? Thanks for the fix!

jgieseler commented 2 years ago

I saw that. And that you kept my name out of it. 😅

Closing this issue.