pysat / pysatMadrigal

Madrigal instrument support for the pysat ecosystem
BSD 3-Clause "New" or "Revised" License
1 stars 1 forks source link

Generalised Madrigal Instrument #1

Open aburrell opened 5 years ago

aburrell commented 5 years ago

The new general methods for Madrigal data that uses MadrigalWeb makes it possible to create a general instrument object for any data stored there. Users could supply the instrument and data code at instantiation instead of a name, and the only thing that would be lacking that the specific instruments have are targeted acknowledgements and the clean routine. Using the Madrigal instrument and experiment codes could be made easier by having the general madrigal init routine could grab these keyword arguments, use them along with functools.partial to set the load and download routines as needed.

rstoneback commented 5 years ago

See jroisr branch for work in this area.

rstoneback commented 5 years ago

Almost everything needed to close this issue is supported except for one thing, filenames. We need an automated way to get the filename format string. Perhaps we could just require that from the user?

aburrell commented 5 years ago

MadrigalWeb has a routine that will list out the file names for a specific time period and experiment, so I don't think we should require that of the user.

rstoneback commented 5 years ago

I tried one of those routines (perhaps the wrong one) and it seemed to include a number of filename options not actually present in the filenames I downloaded. The issue is teaching pysat how to parse the filenames.

Current support keys in on the time only, which is the most relevant value pysat needs. It isn't very specific though. There is a potential collision if the general support is used for more than one instrument, all sets of files will currently be picked up.

aburrell commented 5 years ago

Hmmm, maybe we can ask the user to do this, but also provide a guide on how to use MadrigalWeb to get this information.

rstoneback commented 5 years ago

I think a guide is a good way to go. We can download the files without knowing the filename specifics and a user can easily translate a given filename to a format template string. They could pass it in at Instrument instantiation, inst = pysat.Instrument('madrigal', 'pandas', madrigal_inst_code=8100, madrigal_tag=10241, file_format='dmspivm{year:02d}_{moth:02d}.....', )

I'll need to tweak the general madrigal code to fully support this (oops). Instantiation may be a little verbose but with the file_format string in there it will be as robust as any other pysat Instrument.

rstoneback commented 5 years ago

Code was already in good shape. I added to the docstring to clarify support.

jklenzing commented 5 years ago

Note from pysat/pysat#175: mad_methods.download not implemented via functools.partial. Could be updated for consistency as part of generalized madrigal instrument.

aburrell commented 2 years ago

@rstoneback I have a branch going that is tackling this problem. I ran into the issue that I need general wildcards of unknown length for the files. Did we ever get a routine to handle that?

rstoneback commented 2 years ago

Checking out main the pysat.Files.from_os does ok with leading variations in a filename, stuff before any of the parsed keywords. There you can put in a *-{year:04d}... etc. The wildcard after is less reliable. There is the option for the delimiter but the delimiter has to be exclusive to keywords, I think. Can you tell me more about the potential filenames?

aburrell commented 2 years ago

Not really? It's whatever they are on madrigal. Why is the wildcard after not reliable? I'm sure there's a reason but I can't recall what it is...

rstoneback commented 2 years ago

It wasn't intentional. Just not as general as it could/should be. I believe that code originated way back with C/NOFS and there aren't any parameters after the date so one of those things that slipped through the early gaps. Gap is too narrow but that's the idea.

rstoneback commented 2 years ago

To get back going on pysat and penumbra I was going to start upon DMSP examples as well as continue on improving pysat's 'new' files function. If it would help I can do a pass on the files parsing first to support whatever it is Madrigal does.

aburrell commented 2 years ago

I remember trying to expand this in the past and not being able to. I was hoping you recalled why it didn't work :'( If you want to take another hack at getting it to work, that would be an excellent use of time. If you don't have as much brain space, then go ahead and work on DMSP instead šŸ¬ Both are equally important, I don't need this problem solved for my current research.

rstoneback commented 2 years ago

Iā€™ve been thinking about it since the message. I think I parsed the file names strings backward, I start at the end. That could be one of the reasons why leading wildcards work but others donā€™t.

rstoneback commented 2 years ago

There is pysat pull with improved support for the '*' wildcard when using the delimited parser. The delimited parser also received some other general improvements. https://github.com/pysat/pysat/pull/982

aburrell commented 2 years ago

Now it needs a general xarray instrument. I am not sure this will be possible, so recommend bumping this issue to a higher milestone once #67 is merged.

rstoneback commented 2 years ago

Now it needs a general xarray instrument. I am not sure this will be possible, so recommend bumping this issue to a higher milestone once #67 is merged.

Sounds good to me.