Open aburrell opened 5 years ago
See jroisr branch for work in this area.
Almost everything needed to close this issue is supported except for one thing, filenames. We need an automated way to get the filename format string. Perhaps we could just require that from the user?
MadrigalWeb has a routine that will list out the file names for a specific time period and experiment, so I don't think we should require that of the user.
I tried one of those routines (perhaps the wrong one) and it seemed to include a number of filename options not actually present in the filenames I downloaded. The issue is teaching pysat how to parse the filenames.
Current support keys in on the time only, which is the most relevant value pysat needs. It isn't very specific though. There is a potential collision if the general support is used for more than one instrument, all sets of files will currently be picked up.
Hmmm, maybe we can ask the user to do this, but also provide a guide on how to use MadrigalWeb to get this information.
I think a guide is a good way to go. We can download the files without knowing the filename specifics and a user can easily translate a given filename to a format template string. They could pass it in at Instrument instantiation, inst = pysat.Instrument('madrigal', 'pandas', madrigal_inst_code=8100, madrigal_tag=10241, file_format='dmspivm{year:02d}_{moth:02d}.....', )
I'll need to tweak the general madrigal code to fully support this (oops). Instantiation may be a little verbose but with the file_format string in there it will be as robust as any other pysat Instrument.
Code was already in good shape. I added to the docstring to clarify support.
Note from pysat/pysat#175: mad_methods.download not implemented via functools.partial. Could be updated for consistency as part of generalized madrigal instrument.
@rstoneback I have a branch going that is tackling this problem. I ran into the issue that I need general wildcards of unknown length for the files. Did we ever get a routine to handle that?
Checking out main the pysat.Files.from_os
does ok with leading variations in a filename, stuff before any of the parsed keywords. There you can put in a *-{year:04d}...
etc. The wildcard after is less reliable. There is the option for the delimiter but the delimiter has to be exclusive to keywords, I think. Can you tell me more about the potential filenames?
Not really? It's whatever they are on madrigal. Why is the wildcard after not reliable? I'm sure there's a reason but I can't recall what it is...
It wasn't intentional. Just not as general as it could/should be. I believe that code originated way back with C/NOFS and there aren't any parameters after the date so one of those things that slipped through the early gaps. Gap is too narrow but that's the idea.
To get back going on pysat and penumbra I was going to start upon DMSP examples as well as continue on improving pysat's 'new' files function. If it would help I can do a pass on the files parsing first to support whatever it is Madrigal does.
I remember trying to expand this in the past and not being able to. I was hoping you recalled why it didn't work :'( If you want to take another hack at getting it to work, that would be an excellent use of time. If you don't have as much brain space, then go ahead and work on DMSP instead š¬ Both are equally important, I don't need this problem solved for my current research.
Iāve been thinking about it since the message. I think I parsed the file names strings backward, I start at the end. That could be one of the reasons why leading wildcards work but others donāt.
There is pysat pull with improved support for the '*' wildcard when using the delimited parser. The delimited parser also received some other general improvements. https://github.com/pysat/pysat/pull/982
Now it needs a general xarray instrument. I am not sure this will be possible, so recommend bumping this issue to a higher milestone once #67 is merged.
Now it needs a general xarray instrument. I am not sure this will be possible, so recommend bumping this issue to a higher milestone once #67 is merged.
Sounds good to me.
The new general methods for Madrigal data that uses MadrigalWeb makes it possible to create a general instrument object for any data stored there. Users could supply the instrument and data code at instantiation instead of a name, and the only thing that would be lacking that the specific instruments have are targeted acknowledgements and the clean routine. Using the Madrigal instrument and experiment codes could be made easier by having the general madrigal init routine could grab these keyword arguments, use them along with functools.partial to set the load and download routines as needed.