PHEP 9999: PyHC standardization for Python time objects

heliophysicsPy / standards

3 stars 11 forks source link

PHEP 9999: PyHC standardization for Python time objects #32

Closed nabobalis closed 2 days ago

nabobalis commented 3 months ago

This PR proposes a new process PHEP to PyHC.

PHEP 9999 aims to build consensus within PyHC to standardize time objects to be based on astropy.time.Time, why we should do this and any potential roadblocks.

Note there is no implementation details about to program the transition, the goal for me with this PHEP is to build community support for the idea and what this idea entails.

It is still pretty rough in places, but I hope it's in a decent enough place for reviews and comments.

jklenzing commented 3 months ago

Thanks for putting this together. I have a few thoughts / questions, which may be largely related to my use cases not currently using custom time objects.

What is the rationale for moving to astropy over datetime for objects? This is not particularly clear to me in the writeup (which focuses on this being preferable to writing custom time objects), and it would potentially add a new dependency across the ecosystem. What about a "pythonic" rewording to use astropy instead of building custom time objects?
How does this affect the usage of DatetimeIndex? Currently, most of the pysat ecosystem loads data files as pandas or xarray data objects linked to a DatetimeIndex, usually at 1Hz though sometimes higher.

aburrell commented 3 months ago

Another issue is overhead on operational systems. Datetime requires no additional dependencies and this is a concern for Python projects that are used in an operational environment.

aburrell commented 3 months ago

Another thing that may be useful to do is create a list of all the PyHC packages (not just the core) and see how many of them would need to change. Then talk to those developers (on either side) and their user base.

nabobalis commented 3 months ago

What is the rationale for moving to astropy over datetime for objects? This is not particularly clear to me in the writeup (which focuses on this being preferable to writing custom time objects), and it would potentially add a new dependency across the ecosystem.

I added a section about astropy time. See if you are ok with it.

I see no problem with adding astropy as a new dependc across the ecosystem. It is easy to install , not very large and is used on government machines in operational cases.

What about a "pythonic" rewording to use astropy instead of building custom time objects?

This sounds good, I have added this as well.

How does this affect the usage of DatetimeIndex? Currently, most of the pysat ecosystem loads data files as pandas or xarray data objects linked to a DatetimeIndex, usually at 1Hz though sometimes higher.

That is an open question. Ideally if this PHEP is accepted, I would want to fund a pandas developer to add astropy time support to pandas via a funding proposal call.

nabobalis commented 3 months ago

Another issue is overhead on operational systems. Datetime requires no additional dependencies and this is a concern for Python projects that are used in an operational environment.

My understanding is that astropy already sees use in sevearal operational environments. But I will make a note of this.

I do not know any of the requirements or rule around operational environments, so I will be required to learn what they are so I can try to make a better decision on this topic.

nabobalis commented 3 months ago

Another thing that may be useful to do is create a list of all the PyHC packages (not just the core) and see how many of them would need to change. Then talk to those developers (on either side) and their user base.

I will have a look, I suspect the answer would be everything but a few sunpy released libraries.

sapols commented 3 months ago

Just wanted to chime in with my thanks, @nabobalis! The idea is presented well here, so I’ll save proofreading remarks for later. Overall I think it’d be a huge win for PyHC if we adopted this. It seems the first obvious step will be asking the package maintainers whether they’re willing to do this or not. If they don’t all comment here I’ll be sure to bring it up at the next telecon (and/or PyHC Core tag-up).

jklenzing commented 3 months ago

Thanks @nabobalis, that is more clear. I am still hesitant to enforce dropping datetime support if the pandas solution is not yet implemented, but I would be more comfortable with this as a "recommendation" rather than a "prescription". This would give packages the space to evaluate use cases and decide if they need something more complex. I may try to run some tests to see how much adding astropy affects things.

eelcodoornbos commented 3 months ago

Astropy.time is pretty powerful and comprehensive, but therefore also difficult to understand and learn, especially for those without any prior knowledge on astronomical time scales.

I wonder how many PyHC projects would actually need the unique astropy.time functionality (time scale conversion, 2x64-bit precision) and whether or not pandas.Timestamp would be a better default for most projects. I do think that the Python built-in datetime module is a pretty poor choice for scientific computing.

I've been happily using pandas.Timestamp objects set to UTC in my projects (e.g. https://gitlab.com/KNMI-OSS/spaceweather/swxtools), and only convert back and forth to astropy.time.Time objects when absolutely needed, for example to convert data that is provided with a different time scale than UTC, such as GPS time for some satellite data.

In my experience, most date/time manipulations are in the form of applying time deltas and conversions to/from string representations, which is where pandas.Timestamp is very easy to use. I think their use leads to easy to comprehend (and therefore easy to maintain) code.

Counterarguments and alternative views are welcome of course!

nabobalis commented 3 months ago

Thanks @nabobalis, that is more clear. I am still hesitant to enforce dropping datetime support if the pandas solution is not yet implemented, but I would be more comfortable with this as a "recommendation" rather than a "prescription". This would give packages the space to evaluate use cases and decide if they need something more complex. I may try to run some tests to see how much adding astropy affects things.

Sorry for the late reply. This PHEP (and I need to rewrite this since it isn't very clear) is about working out if there is community consensus for the idea and if so, we would start by adding support to pandas for astropy.time (via a roses call). Before that, we won't be able to enforce this until the foundational blocks are in place. That would be unfair.

nabobalis commented 3 months ago

Astropy.time is pretty powerful and comprehensive, but therefore also difficult to understand and learn, especially for those without any prior knowledge on astronomical time scales.

I have to say I fundamentally disagree, for most users it has a very similar API as datetime, there is no jump in complexity if you just need UTC. It has been deemed simple enough to teach at two PyHC summer schools now.

I wonder how many PyHC projects would actually need the unique astropy.time functionality (time scale conversion, 2x64-bit precision) and whether or not pandas.Timestamp would be a better default for most projects. I do think that the Python built-in datetime module is a pretty poor choice for scientific computing.

The goal of this PHEP is to standardize what time objects are used to ensure that users have to deal with one type of datetime object and for developers of packages to be aware what they should be using.

I've been happily using pandas.Timestamp objects set to UTC in my projects (e.g. gitlab.com/KNMI-OSS/spaceweather/swxtools), and only convert back and forth to astropy.time.Time objects when absolutely needed, for example to convert data that is provided with a different time scale than UTC, such as GPS time for some satellite data.

But in this case, you already have to use with astropy to convert formats like this, so using it from the very start reduces the need to convert between pandas and astropy time objects. This PHEP would mean you have to not do that in future.

Hopefully if we can agree to do this, we will submit a proposal to fund a pandas developer to add astropy time support into pandas so we can have the best of both worlds.

eelcodoornbos commented 3 months ago

I have to say I fundamentally disagree, for most users it has a very similar API as datetime, there is no jump in complexity if you just need UTC. It has been deemed simple enough to teach at two PyHC summer schools now.

If you omit the complexity, astropy.time is of course easy to teach and learn. But this is even more true for Pandas.Timestamp, which has an even more flexible and intuitive interface, in my opinion, and which Python developers from different backgrounds will already know and love.

The astropy.time docs start out by listing time scales that only experienced users will be familiar with. There are also some opinionated choices in the implementation, for example, on the definitions and distinctions between time formats and time scales, which users taking advantage of the advanced features will have to learn, but which are not always straightforward. For instance, I'm myself still puzzled by GPS time being defined as a format, not as a time scale within astropy.time. This makes conversions between GPS and UTC timestamps with astropy.time ugly, even though it looks at first glance that this would be easy.

Hopefully if we can agree to do this, we will submit a proposal to fund a pandas developer to add astropy time support into pandas so we can have the best of both worlds.

That would be nice, but I wonder how this would look. I think it would be good for the discussion if the outcome of that work can be further specified.

If it means that pandas gets some configuration option so that astropy.time objects can then be used in pandas "behind-the-scenes" when manipulating Timestamps, DateTimeIndex, etc, while all the methods for manipulation are the same as for the current pandas objects (but with the addition of time scale conversion methods and inherent higher precision), that would be great.

Or would the proposal be for a future version of pandas (or some sort of pandas plug-in module) to adopt the astropy date/time manipulation methods? That would be more tricky to implement and more confusing to users, I think.

Even then, I think complexity/performance vs added value trade-offs need to be investigated as well. Astropy.time uses double the memory (double 64-bit timestamps), to accommodate the higher precision over long time spans. What are the implications for performance and hardware requirements of software that, for example, just processes 1-sec cadence satellite data, for which the single 64-bit timestamps are usually more than enough?

I personally don't think I would prefer to use an astropy.time option once implemented in pandas, except in some rare cases.

rstoneback commented 2 months ago

While AstroPy and SpacePy may have time support already, if it is going to be a standard the time functionality should be in its own independent package.
There was no community-wide selection of Astropy over SpacePy or over potentially creating a new time package. For example, AstroPy uses to 64-bit numbers to support precise times over long timescales. Not everyone needs this. The most common case could simply be support for leap seconds.
As already noted, pysat uses pandas and xarray. The only viable mechanism for pysat to use an updated time is to have it integrated into pandas' DatetimeIndex.
Operational and other systems can have long timescales, much longer than NEP or other PyHC support timelines. Thus, a standard time package would need to support older Python and associated packages longer than 'normal'. This is easiest to do if the time support is an independent package.
We'd need funding at the start, not just for a potential future integration into Pandas. NASA's funding for open source science is an ongoing concern. Not only do programs like B.20 lack sufficient funding for community efforts but I don't think NASA has ever presented evidence demonstrating it has an appropriate selection process.
To get Pandas to agree to incorporate science time we'd need not only funding but a good argument as to why anyone outside of astronomy should care about leap seconds etc. Trying to bring Pandas in after a time package is developed is not likely to go well unless the package does everything Pandas wants. Do we know yet what it would take to get Pandas, or datetime, or anyone at a non-science institution to say yes to science time?

dstansby commented 2 months ago

if it is going to be a standard the time functionality should be in its own independent package.

Can you elaborate on this a bit more? I would have thought the advantages of depending on a third party library that has a wide existing maintenance team (e.g., astropy, pandas) that still accept contributions and suggstions would be much more efficient in time and money than developing new time package n+1.

a standard time package would need to support older Python and associated packages longer than 'normal'.

Currently https://github.com/heliophysicsPy/standards/pull/29 adopts the same recommendations as SEP 0, so as long as whatever package is standardised around follows SEP 0 this shouldn't be an issue? Either way, it seems like as long as a package is compatible with PHEP 3 if/when it's merged it should be fine because it will essentially have the same support policy as PyHC recommendations.

Cadair commented 2 months ago

I haven't caught up on the whole thread, but I want to say that the limitation of not being able to use astropy's Time in pandas indexes (and therefore xarray) is a technical limitation that can be overcome. Obviously for the whole ecosystem to adopt Time everywhere this would have to be done, but if that's the direction the community wants to head in then I think it would be easy enough to use some funding to pay the right people to make that happen. (One option would be companies like Quansight, who I believe have done similar work in pandas before on research grants).

rstoneback commented 2 months ago

if it is going to be a standard the time functionality should be in its own independent package.

Can you elaborate on this a bit more? I would have thought the advantages of depending on a third party library that has a wide existing maintenance team (e.g., astropy, pandas) that still accept contributions and suggstions would be much more efficient in time and money than developing new time package n+1.

Sure! I'm not saying that we have to develop a new time package, but that if code from astropy or spacepy is going to be labeled a PyHC standard and spread to the wider Python community then that time code should be spun off into its own package.

Developer focus. The focus would be on time and time alone, not split between all of these other functions in the overall package.
Suppose there is a bug in the time code. It is easier for an independent package to release a new version than if it is contained within a larger more complicated package. Most likely, time bugfixes would have to wait on the release schedule of the whole package.
Developing a standard is different than a higher level package. There is a saying I've heard for Python, "Python core is where packages go to die." Standards need to change more slowly since every API change impacts all of the software above. Dealing with standards changes takes away from developer time for higher level packages that could go into features.
Standards are meant to make it easier on higher level packages. This includes having longer support cycles. It is easier to support 5-10 years of Python packages in something focused, like a time package, than across a whole series of functions like the rest of astropy or spacepy or whatever.
Eating 'own dogfood'. Easier as a developer to know if a package works well for integration by others if that developer has to integrate it themselves. The pysat ecosystem was split into a bunch of packages because we hope that people will choose to build upon pysat. We know if works well for that purpose because we do it ourselves.

I will note that I am generally opposed to the whole notion there should only be one software package for a given feature. If that was really an effective way to go then we'd see that throughout the free market. Mostly though there are always multiple software packages for a given problem. How a problem is solved is as important as solving the problem.

a standard time package would need to support older Python and associated packages longer than 'normal'.

Currently #29 adopts the same recommendations as SEP 0, so as long as whatever package is standardised around follows SEP 0 this shouldn't be an issue? Either way, it seems like as long as a package is compatible with PHEP 3 if/when it's merged it should be fine because it will essentially have the same support policy as PyHC recommendations.

pysat is trying to support Python as far back is 3.6 for operational users. It isn't easy. Satellite missions last 5-10 years or more and generally there isn't enough funding or available developer time within the mission to upgrade things part way through. Standards need to go out of their way to make things easier for users.

rstoneback commented 2 months ago

I haven't caught up on the whole thread, but I want to say that the limitation of not being able to use astropy's Time in pandas indexes (and therefore xarray) is a technical limitation that can be overcome.

I'd say the primary issue is a community one. The last official stance I heard was that datetime would never accept science time as leap seconds for the future aren't already known. Like, nobody can say yet how many leap seconds will be needed in 2030. We heard on the last call that may be changing. Nevertheless, even with a technical solution if packages like pandas/datetime/whomever aren't interested in the feature the technicals don't matter.

Obviously for the whole ecosystem to adopt Time everywhere this would have to be done, but if that's the direction the community wants to head in then I think it would be easy enough to use some funding to pay the right people to make that happen. (One option would be companies like Quansight, who I believe have done similar work in pandas before on research grants).

If pandas etc. is willing to accept the feature then sure, the integration is a technical one. Pandas has features like moving ahead x business days etc. This is harder to do when the number of seconds per day isn't fixed. The scope of the integration can't really be set until we have a better understanding of details.

My preference would be for PyHC to be the primary on any possible integration rather than turn it over.

I've been in space science going on 20 years now. I've been on more proposals than I can count as well as review committees for science, software, instrumentation, satellite missions, etc. for both NSF and NASA. That experience has shown me to never count on anything related to government funding.

nabobalis commented 2 months ago

If you omit the complexity, astropy.time is of course easy to teach and learn. But this is even more true for Pandas.Timestamp, which has an even more flexible and intuitive interface, in my opinion, and which Python developers from different backgrounds will already know and love.

That is true and by adding support for astropy.time as a pandas Index, we would allow the best of both worlds here.

The astropy.time docs start out by listing time scales that only experienced users will be familiar with. There are also some opinionated choices in the implementation, for example, on the definitions and distinctions between time formats and time scales, which users taking advantage of the advanced features will have to learn, but which are not always straightforward.

If the main problem we currently have is that the documentation of astropy.time can do with tidy up and improvements, I would say we are in a good place. We can contribute upstream to fix these problems.

For instance, I'm myself still puzzled by GPS time being defined as a format, not as a time scale within astropy.time. This makes conversions between GPS and UTC timestamps with astropy.time ugly, even though it looks at first glance that this would be easy.

This is important feedback which we can use to open issues and improve the user experience with astropy.tim. We are working on open-source software, we have to be willing to work with upstream and improve libraries that would benefit a wider community. Otherwise, why do we bother with any of this?

That would be nice, but I wonder how this would look. I think it would be good for the discussion if the outcome of that work can be further specified.

Agreed.

If it means that pandas gets some configuration option so that astropy.time objects can then be used in pandas "behind-the-scenes" when manipulating Timestamps, DateTimeIndex, etc, while all the methods for manipulation are the same as for the current pandas objects (but with the addition of time scale conversion methods and inherent higher precision), that would be great.

This would be the main goal.

Or would the proposal be for a future version of pandas (or some sort of pandas plug-in module) to adopt the astropy date/time manipulation methods? That would be more tricky to implement and more confusing to users, I think.

This can be worked on after we have the first step down.

Even then, I think complexity/performance vs added value trade-offs need to be investigated as well. Astropy.time uses double the memory (double 64-bit timestamps), to accommodate the higher precision over long time spans. What are the implications for performance and hardware requirements of software that, for example, just processes 1-sec cadence satellite data, for which the single 64-bit timestamps are usually more than enough?

I would state (without evidence) that computers are fast enough that this shouldn't matter. If the worst comes to it, we can always improve the speed of astropy.time underneath.

nabobalis commented 2 months ago

While AstroPy and SpacePy may have time support already, if it is going to be a standard the time functionality should be in its own independent package.

The answer for me here is to depend on astropy. There is no need for another package. astropy provides wheels for almost every platform and is a very small package.

There was no community-wide selection of Astropy over SpacePy or over potentially creating a new time package. For example, AstroPy uses to 64-bit numbers to support precise times over long timescales. Not everyone needs this. The most common case could simply be support for leap seconds.

The goal isn't just about the features of astropy.time, it's about centralizing around one library that handles the time handling for us.

As already noted, pysat uses pandas and xarray. The only viable mechanism for pysat to use an updated time is to have it integrated into pandas' DatetimeIndex.

Yes and I want to add astropy.time support into pandas.

Operational and other systems can have long timescales, much longer than NEP or other PyHC support timelines. Thus, a standard time package would need to support older Python and associated packages longer than 'normal'. This is easiest to do if the time support is an independent package.

That is a fair point but operational libraries maybe don't follow these standards and within a world where we have levels of PyHC "status", those are left outside of that grading. The same goes for the PHEP 3 about Python versions.

We'd need funding at the start, not just for a potential future integration into Pandas. NASA's funding for open source science is an ongoing concern. Not only do programs like B.20 lack sufficient funding for community efforts but I don't think NASA has ever presented evidence demonstrating it has an appropriate selection process.

We would not need to fund a community with B.20, we just need to fund a qualified developer.

To get Pandas to agree to incorporate science time we'd need not only funding but a good argument as to why anyone outside of astronomy should care about leap seconds etc. Trying to bring Pandas in after a time package is developed is not likely to go well unless the package does everything Pandas wants. Do we know yet what it would take to get Pandas, or datetime, or anyone at a non-science institution to say yes to science time?

It isn't about getting other people to use astropy.time within pandas who do not need it. It would be working with pandas to add opt-in support for astropy.time. It is possible that they won't accept it into pandas directly but as a plugin like the same way that "CFTimeIndex" works in xarray. Which would create a small plugin library that would depend on pandas and astropy.

nabobalis commented 2 months ago

Sure! I'm not saying that we have to develop a new time package, but that if code from astropy or spacepy is going to be labeled a PyHC standard and spread to the wider Python community then that time code should be spun off into its own package.

By wider community do you mean the PyHC community or Scientific Python?

The goal of this PHEP is to reduce the different ways that this community handles time. The future goal is to work this into units and coordinates. There is no reason to spread this work outside of PyHC.

Developer focus. The focus would be on time and time alone, not split between all of these other functions in the overall package.

This would be important if you have like 1 developer on a package, astropy does not. They have dedicated people to maintain each subsection of the library.

Suppose there is a bug in the time code. It is easier for an independent package to release a new version than if it is contained within a larger more complicated package. Most likely, time bugfixes would have to wait on the release schedule of the whole package.

This is true for any package, I don't see how this is a problem. If we discover that numpy does not handle multi threaded windows tasks, we have to wait for numpy to patch that.

Dealing with standards changes takes away from developer time for higher level packages that could go into features.

Standards enable developers to spend less time on busy work and enable them to work on features.

Standards are meant to make it easier on higher level packages. This includes having longer support cycles. It is easier to support 5-10 years of Python packages in something focused, like a time package, than across a whole series of functions like the rest of astropy or spacepy or whatever.

The Python ecosystem does not support that kind of timescale. SPEC 0 reduces this down and numpy and all major packages follow that schedule.

I will note that I am generally opposed to the whole notion there should only be one software package for a given feature. If that was really an effective way to go then we'd see that throughout the free market. Mostly though there are always multiple software packages for a given problem. How a problem is solved is as important as solving the problem.

The free market does not have one way to do the same thing because they are competing to make more money. The reason that Microsoft runs a search engine, isn't because they think Google does a bad job of it, they do it to make more money for themselves.

We are part of a large open-source community, we are not doing this for the free market. We want to enable other developers within our community to be able to not have to re-code the same software again and again. We see centralization within the wider open-source community because its pretty much understand that a group focusing on one larger task is easier than splitting everything up into similar but packages. There might be a million Linux distros but they all use the same Linux kernel, they almost all use the same init system. The wider community there has decided that it isn't worth it to try and duplicate these systems.

I want to see if there a desire for the same within this community and if not, then PyHC should consider having no standards.

pysat is trying to support Python as far back is 3.6 for operational users. It isn't easy. Satellite missions last 5-10 years or more and generally there isn't enough funding or available developer time within the mission to upgrade things part way through. Standards need to go out of their way to make things easier for users.

And pysat can and if there is enough dev effort to support older versions. However we are trying to bring the PyHC community up to standards within the Python ecosystem such that packages can work together in order to reduce duplicated effort.

If packages can't (or do not want to) meet standards, as defined in PHEP 4 are still listed and advertised by they might not get the higher levels of support from PyHC or get the badges next to their name. I think that is more than fair.

jameswilburlewis commented 2 months ago

From the PySPEDAS perspective, we already have a dependency on astropy (for units), so using astropy for time seems reasonable. We use xarray DataArray structures internally, and rely on their methods for time indexing and interpolation (which I believe is Pandas under the hood).

For efficiency's sake, we would want to get buy-in from the NetCDF4 and cdflib maintainers, so we don't have to perform multiple time conversions when reading from data files. As things stand now, cdflib returns numpy datetime64 values with nanosecond precision, which pyspedas has adopted as our internal time representation. It looks like NetCDF returns Python datetime objects (in UTC with no time zone offset), which pyspedas converts to floating point Unix times, and eventually to np.datetime64. (So there is likely some room for improvement on our end.). If NetCDF4 and cdflib were to offer options to return timestamps as astropy times, that would be the best case scenario for us.

The other potential issue for us would be passing astropy times directly to matplotlib, which I think I've heard is achievable via matplotlib's configuration options.

PySPEDAS load routines, and many analysis tools, take time ranges expressed as 2-element lists [start_time, end_time]. We already support strings, Python datetimes, and floating point Unix times, so adding support for astropy times in these interfaces would be fairly straightforward.

The sticking point would be xarray/Pandas support -- without that, we'd have to perform time conversions for nearly every operation that touches the loaded data, or abandon xarray entirely and rework a lot of our internal tooling.

rweigel commented 2 months ago

In general, I support a way to simplify the exchange of time data between libraries. However, I have some questions about the benefit vs. the cost of rewriting existing PyHC core packages and replacing any use of datetime or custom time objects with astropy.time.Time.

Could you give some use cases of interfacing different PyHC packages that demonstrate the simplification that will result?
What is the size of astropy + all of its dependencies?
If a package only needs datetime, why would a package maintainer want to require astropy given that an astropy.time.Time can be created from datetimes?
What is the performance difference between datetime or datetime64 operations and astropy.time.Time operations? We often deal with arrays of length ~86400*365. Consider the following. See also https://github.com/MAVENSDC/cdflib/issues/65#issuecomment-510258945 for additional tests.

import numpy as np
import astropy.units as u
from astropy.time import Time

import time

n = 86400*365
import numpy as np
start = time.time()
res = np.arange(np.datetime64("2024-01-01"), np.datetime64("2024-12-31"), np.timedelta64(1, "s"))
print(len(res))
end = time.time()
print(end - start) # 0.057021141052246094

start = time.time()
timeo = Time("2024-01-01T00:00:00Z", scale='utc', format='isot', precision=9)
times = timeo + np.arange(n) * u.second
#times.to_datetime())
print(len(times))
end = time.time()
print(end - start) # 18.311403036117554

import datetime
start = time.time()
test_date = datetime.datetime(2024, 1, 1)
res = [test_date + datetime.timedelta(seconds=idx) for idx in range(int(n))]
print(len(res))
end = time.time()
print(end - start) # 8.857582092285156

What is the difference in memory required to store an array of times? For example, an array of datetime64s vs. the equivalent in astropy.time.Time?
How will package maintainers know they must test their package with a new version of astropy.time.Time? What happens if the astropy package version changes but there are no changes in astropy.time.Time (for the case when the package maintainer does not assume that their package will be installed as part of a PyHC bundle, in which case AstroPy will be available)?
The issue of roundoff error ("Since internally Time uses floating point numbers, round-off errors can cause two times to be not strictly equal even if mathematically they should be.") should be studied. I generally avoid representing time as floats if possible.
Mandating that PyHC core packages be rewritten by replacing any use of datetime or custom time objects with astropy.time.Time seems unnecessary. Most standards I am familiar with do not mandate implementation; they mandate representations at the interface level. What is the benefit of mandating how code works internally?
What fraction of the PyHC packages duplicate functionality in astropy.time.Time? In the PEP, it is noted, "One of the main focuses within PyHC should be around the goal of reducing the amount of code that each package has to either create and maintain that implements the same functionality." How will a rewrite to use astropy.time.Time instead of datetime and/or datetime64/pandas.Timestamp lead to less code and maintenance? If a package only needs datetime and is rewritten to use astropy.time.Time, it seems there is more maintenance because of the new dependency.
Could you add examples to the PEP that justify, "This meant that all PyHC packages had to create their own implementations of time." Do all PyHC package have their own implementation of time, and what is meant by "implementation of time"?
It is suggested that package authors can write proposals to comply with the PEP. How many proposals would need to be written, and what is an estimate for the time required for proposal writing and implementation?

Based on my experience using ~6 PyHC packages, there would be a benefit to mandating that functions that deal with time, and don't need the additional functionality in astropy.time.Time, take datetime (or datetime64 given most packages depend on NumPy) as an input and allow it as an output option or method (an alternative would be that functions take an input of a restricted subset of ISO 8601 and allow it as an output option or method). If, and only if, the functions are implementing functionality that exists in astropy.time.Time and not datetime or datetime64, they should use astropy.time.Time.

I agree with @eelcodoornbos comment about learning astropy.time.Time, which I have done recently, and his comment about datetime being a poor choice for scientific computing (but I would argue that it is an unfortunate choice that we have to live with and it is "good enough" for most applications).

jibarnum commented 2 months ago

Others have already added their thoughts above with good pros and areas of consideration for the PHEP as it stands now, so I won't belabor those.

Based on today's core PyHC tag-up, we'll see a rewrite of this PHEP to instead heavily encourage the use of astropy as able, but also working with datetime if needed.

I do still suggest strongly-worded recommendations for astropy where it makes sense to do so (e.g., projects that can more easily make the switch or new projects coming into PyHC). I think this will help support our reasoning for going out to other communities and asking for either 1) better astropy integration (e.g., with pandas) or 2) updates within astropy itself to allow an entire Python developer community to eventually adopt astropy or make it our "gold standard" (example update: astropy could allow a toggle option for 64-bit floats or 32-bit floats for time representation to allow precision where needed vs reducing memory usage). Further, showing a strong recommendation gives better arguments for NASA and other entities to fund proposals to align with this PHEP.

Thanks again @nabobalis for spearheading the writing of this PHEP. I think we're on the right track with bringing this to the forefront.

jtniehof commented 2 months ago

Apologies for not jumping in sooner.

I can 100% get behind making astropy.Time first-class objects (i.e. wherever a package supports time input, it must transparently accept astropy.Time, and make it trivial to retrieve astropy.Time). And two further strong recommendations: use astropy.Time preferentially, and do not create more new time objects.

I'm really averse to requiring all packages to reimplement everything to use astropy.Time exclusively. I don't think this is addressed by "we'll do pull requests to astropy to handle everybody's concerns"--there's no guarantee that all concerns will actually be addressed. It's tough to sign on to "we will absolutely rip out code and hopefully resolve the funding and incompatibility issues later". The PHEP provides a credible process but not guaranteed.

jtniehof commented 2 months ago

I had a thought while making breakfast (almost as good as the shower).

I might be missing the main purpose here. You say right up front this is a "new process PHEP". I had been thinking in terms of standards. If the goal is to say "Let's try to figure out a path forward and here are some concrete steps" rather than "all PyHC packages must do X and this is how we're going to get there", then I've been thinking all wrong (probably would still have suggested changes, but they'd be different).

The header does indicate standards track, so maybe that should be process (or maaaaybe informational, I could see either way).

rstoneback commented 2 months ago

I do still suggest strongly-worded recommendations for astropy where it makes sense to do so (e.g., projects that can more easily make the switch or new projects coming into PyHC).

I disagree. Recommendations quickly become requirements. This is not only in the gold/silver etc. standards but NASA can decide, via funding requirements, to require 'recommended' stances with no notice. NASA applying PyHC standard also creates hidden requirements that packages have to invest labor or be subjected to rules that may be inappropriate.

I'd like to state again I believe this PHEP is starting well ahead of things.

1) First, we haven't even established there is a problem. SpacePy is the only PyHC package to create time support. Other packages use datetime, which is actually fine. In practice, there are only a few leap seconds per year. So not supporting a leap seconds affects about 6E-8 of the total samples in a year.

While AstroPy also has time support, it is not in PyHC.

2) We haven't established requirements. We need requirements to sort out what we need, what success looks like, what resources are required, and how to approach a proposal.

3) I disagree with cost arguments. NASA paid both Boeing and Space-X to develop spacecraft for transport to the ISS. How many additional billions were spent for two kinds of spacecraft? How many Python packages would those billions fund? Having more than one software package makes no substantive difference to NASA spending at all.

4) Ignoring lessons from the history of software development is ill advised. The reason there is usually more than one software package for a problem is because not everyone has the same requirements. Several pysat developers have conveyed requirements for our use case, requirements which have been brushed aside by the PHEP.

5) I started with PyHC, from the very beginning, to promote development of quality software. I didn't join to enforce conformity within the community. Broad based requirements are good, like having a consistent coding standard with user documentation. Specific requirements can easily hold back development. Specific requirements are more likely to interfere with the broad range of requirements within PyHC.

6) There have been claims about future AstroPy support but these claims aren't from AstroPy itself.

7) If the requirements for a scientific time are already satisfied with a pre-existing package, then why do we need to require developers to use it? I'd say it is overwhelmingly likely packages will use it on their own. If developers do decide to invest the time and energy to create a new package the I think it is highly likely they have good reasons to do so.

I think our time would be better spent determining what requirements our community really needs and then addressing those. Getting science time into broader Python packages, like Pandas and Xarray, is more useful than requiring AstroPy or SpacePy. There are a whole range of commercial companies that could benefit from science time if it was supported by packages they already use. On the other hand, AstroPy/SpacePy already exists. Scientists can already use it if they want.

jibarnum commented 2 months ago

@rstoneback

I don't think it's unreasonable that if a package can easily use Astropy to do it for the sake of simplifying interoperability. This PHEP as discussed at the last PyHC core tag-up will be updated by @nabobalis to not specifically require astropy, but highly encourage it, so plans for switching over to forcing only astropy aren't there. If the suggested actions above end up coming to fruition (modifications to astropy, datetime, etc.), we can revisit then if the community wants.

As for specifically point # 5 that you made, two of PyHC's strategic goals are:

Foster an open-source Python software ecosystem for heliophysics research and education, and
Enable efficient interdisciplinary research

These goals are made significantly easier/more fully realized when packages can interoperate. Thus, the bringing up of finding standards for time, coords, etc. Also, thinking about these kinds of things now can help set up for the future of PyHC, and growth.

eelcodoornbos commented 2 months ago

I have to agree with @rstoneback here. I’m not sure at all that interoperability would be simplified by recommending astropy in its current form.

I don’t think it has been discussed enough that astropy is a very opinionated library. It requires developers to work in a certain way, e.g. make use of its units and reference frame/scale definitions stored in its objects, its non-standard pretty printing of its objects, etc. This has nice advantages if you stick to working only with astropy objects but actually complicates interoperability with pandas (and other libraries) quite a lot.

Its time component supports leap seconds and high accuracy, but it does not support time zones or the broad range of input/output and calendar related options of pd.Timestamp.

Its coordinate handling might work very well for astronomers and seems to be robustly implemented, but I personally find its rich objects very clunky for making conversions in Earth (thermosphere-ionosphere) satellite data processing where input/output is stored in Pandas dataframes. You have to do a lot of seemingly unnecessary extra work to juggle with coordinate components, naming of panda’s columns, etc.

This is not astropy’s or pandas fault, just a consequence of the different philosophies involved. I think these should be reconciled first before recommending adoption, even for new developments.

I think adding some flexible to_pandas() and from_pandas() methods (or similar) to astropy, while using pandas as a recommended library for exchange of tabular data in the PyHC ecosystem might be a better place to start, and much more likely to succeed than starting now already to adopt astropy while proposing to modify pandas to adopt its functionality.

nabobalis commented 2 months ago

Hello everyone, I have revised the PHEP more in line with what was discussed in the last PyHC core tag up.

It drops the requirement to use astropy.time but encourages other libraries to accept them as possible inputs in the future, especially for those libraries that interface with pandas and other libraries, that will only be possible when work is done to enable that.

For libraries using datetime, there will be no changes for them to do.

nabobalis commented 1 month ago

I disagree. Recommendations quickly become requirements.

As a community we should we setting ourselves recommendations that might involve doing some work.

First, we haven't even established there is a problem. SpacePy is the only PyHC package to create time support. Other packages use datetime, which is actually fine. In practice, there are only a few leap seconds per year. So not supporting a leap seconds affects about 6E-8 of the total samples in a year.

The goal isn't to solve a problem per say but to bring the community towards a common set of tooling that we can expand upon and ideally base our interoperably on in the future.

While AstroPy also has time support, it is not in PyHC.

Is that a problem?

We haven't established requirements. We need requirements to sort out what we need, what success looks like, what resources are required, and how to approach a proposal.

This community is a great place to establish these requirements especially around getting astropy integrated with pandas and xarray.

Ignoring lessons from the history of software development is ill advised. The reason there is usually more than one software package for a problem is because not everyone has the same requirements. Several pysat developers have conveyed requirements for our use case, requirements which have been brushed aside by the PHEP.

The lack of support of astropy objects within pandas and xarray are known problems. Which only do sunpy developers want fixed, astropy and xarray developers as well.

There is a desire from the larger Python ecosystem to integrate but the main stumbling block has been defining the specific requirements to determine. If people are willing to help, we can work on those and bring these communities together instead of being seperated.

There have been claims about future AstroPy support but these claims aren't from AstroPy itself.

The claims made about astropy support were focused on tutorials or examples or even maybe a code review. The astropy maintainters have offered some of these before for PyHC at the summer school or at meetings. But yes, there are no claims from astrpoy themselves but they have been very welcoming.

If the requirements for a scientific time are already satisfied with a pre-existing package, then why do we need to require developers to use it? I'd say it is overwhelmingly likely packages will use it on their own. If developers do decide to invest the time and energy to create a new package the I think it is highly likely they have good reasons to do so.

Some developers are not aware of the largest ecosystem, this PHEP is partly informational as well as being a recommendation.

If developers within PyHC do decide that astropy.time isn't what they need, then there should be a conversation as to why, what can be changed to prevent another time library from being created and have changes contributed to astropy.

Either this is an open-source community angled at improving not only our own projects but the broader ecosystem or we are just out for ourselves and can drop many of PyHC core objectives.

I think our time would be better spent determining what requirements our community really needs and then addressing those. Getting science time into broader Python packages, like Pandas and Xarray, is more useful than requiring AstroPy or SpacePy. There are a whole range of commercial companies that could benefit from science time if it was supported by packages they already use. On the other hand, AstroPy/SpacePy already exists. Scientists can already use it if they want.

I agree, we should be getting astropy.time integrated within the broader ecosystem but we need to demonstrate that there is a willing community who needs that and would adopt that.

nabobalis commented 1 month ago

What is the size of astropy + all of its dependencies?

The astropy wheel is 6-10MB depending on the platform, its core dependency is numpy.

What is the performance difference between datetime or datetime64 operations and astropy.time.Time operations?

What is the difference in memory required to store an array of times? For example, an array of datetime64s vs. the equivalent in astropy.time.Time?

astropy's performance is going to be slower and I would expect it to use more memory as well. The hope is that a GSoC project can be put together next year to benchmark astropy so areas of improvement can be identified.

How will package maintainers know they must test their package with a new version of astropy.time.Time? What happens if the astropy package version changes but there are no changes in astropy.time.Time (for the case when the package maintainer does not assume that their package will be installed as part of a PyHC bundle, in which case AstroPy will be available)?

Libraries should be testing with their dev/git versions of their upstream dependencies.

The issue of roundoff error ("Since internally Time uses floating point numbers, round-off errors can cause two times to be not strictly equal even if mathematically they should be.") should be studied. I generally avoid representing time as floats if possible.

We can bring this up with the astropy time maintainers, I would hope that we can add features that are missing to astropy time if this community is an initiative like this.

Mandating that PyHC core packages be rewritten by replacing any use of datetime or custom time objects with astropy.time.Time seems unnecessary. Most standards I am familiar with do not mandate implementation; they mandate representations at the interface level. What is the benefit of mandating how code works internally?

Overall I actually want to make a interface argument instead, i.e., everyone should accept and return astropy times, and leave the internals up the library. I can rephrase the PHEP more if required.

Could you add examples to the PEP that justify, "This meant that all PyHC packages had to create their own implementations of time." Do all PyHC package have their own implementation of time, and what is meant by "implementation of time"?

I removed this line as I was overzealous in what I wrote.

rstoneback commented 1 month ago

These goals are made significantly easier/more fully realized when packages can interoperate. Thus, the bringing up of finding standards for time, coords, etc. Also, thinking about these kinds of things now can help set up for the future of PyHC, and growth.

Again, per my point #1, a problem hasn't actually been established yet. Further, it also hasn't been established that packages can't interoperate.

I am thinking about the future. Overly specific standards hinder development. Wes McKinney, when giving early presentations on pandas, stated he had people telling him not to make pandas since numpy already exists. If he was in a group that specified people use numpy then Pandas wouldn't exist.

How can anyone claim that astropy time is the way to go when requirements (point #2) have yet to be established?

This PHEP needs to start back over at zero.

rstoneback commented 1 month ago

I disagree. Recommendations quickly become requirements.

As a community we should we setting ourselves recommendations that might involve doing some work.

I have problems with the PHEP due to the inappropriate construction. I also have problems doing labor for free, especially given NASA repeated claims to "pay what it costs." That is not the same as an unwillingness to work.

First, we haven't even established there is a problem. SpacePy is the only PyHC package to create time support. Other packages use datetime, which is actually fine. In practice, there are only a few leap seconds per year. So not supporting a leap seconds affects about 6E-8 of the total samples in a year.

The goal isn't to solve a problem per say but to bring the community towards a common set of tooling that we can expand upon and ideally base our interoperably on in the future.

If there isn't a problem then there is no reason to force new standards on the community.

While AstroPy also has time support, it is not in PyHC.

Is that a problem?

No, but packages outside of our community aren't a PyHC problem that require us to impose additional rules.

We haven't established requirements. We need requirements to sort out what we need, what success looks like, what resources are required, and how to approach a proposal.

This community is a great place to establish these requirements especially around getting astropy integrated with pandas and xarray.

The PHEP has been written before getting requirements. Requirements need to be collected first.

Ignoring lessons from the history of software development is ill advised. The reason there is usually more than one software package for a problem is because not everyone has the same requirements. Several pysat developers have conveyed requirements for our use case, requirements which have been brushed aside by the PHEP.

The lack of support of astropy objects within pandas and xarray are known problems. Which only do sunpy developers want fixed, astropy and xarray developers as well.

This response does not address the substance of the criticism.

There is a desire from the larger Python ecosystem to integrate but the main stumbling block has been defining the specific requirements to determine. If people are willing to help, we can work on those and bring these communities together instead of being seperated.

There have been claims about future AstroPy support but these claims aren't from AstroPy itself.

The claims made about astropy support were focused on tutorials or examples or even maybe a code review. The astropy maintainters have offered some of these before for PyHC at the summer school or at meetings. But yes, there are no claims from astrpoy themselves but they have been very welcoming.

There are a large range of requirements within this community. These requirements are different than what astronomers experience. Unless astropy is committed to supporting a broader range of requirements then mandating astropy support will only cause problems.

If the requirements for a scientific time are already satisfied with a pre-existing package, then why do we need to require developers to use it? I'd say it is overwhelmingly likely packages will use it on their own. If developers do decide to invest the time and energy to create a new package the I think it is highly likely they have good reasons to do so.

Some developers are not aware of the largest ecosystem, this PHEP is partly informational as well as being a recommendation.

Then the focus should be on advertising.

If developers within PyHC do decide that astropy.time isn't what they need, then there should be a conversation as to why, what can be changed to prevent another time library from being created and have changes contributed to astropy.

Either this is an open-source community angled at improving not only our own projects but the broader ecosystem or we are just out for ourselves and can drop many of PyHC core objectives.

We shouldn't adopt new standards just to adopt a new standards. To benefit the community, not only now but into the future, then the standards we do adopt need to be very well thought out.

The PHEP so far has failed to demonstrate that the PHEP is required. Thus it isn't justified to claim that opposition to the PHEP is being "just out for ourselves" or contrary to core open source principles.

I think our time would be better spent determining what requirements our community really needs and then addressing those. Getting science time into broader Python packages, like Pandas and Xarray, is more useful than requiring AstroPy or SpacePy. There are a whole range of commercial companies that could benefit from science time if it was supported by packages they already use. On the other hand, AstroPy/SpacePy already exists. Scientists can already use it if they want.

I agree, we should be getting astropy.time integrated within the broader ecosystem but we need to demonstrate that there is a willing community who needs that and would adopt that.

If we are going to get science time into the broader community then we should create an independent package to do that. If our requirements are best served by SpacePy, or by AstroPy, then that would be the place to start. An independent package is the setup that is most likely to serve the diverse range of requirements in the community.

jibarnum commented 1 month ago

These goals are made significantly easier/more fully realized when packages can interoperate. Thus, the bringing up of finding standards for time, coords, etc. Also, thinking about these kinds of things now can help set up for the future of PyHC, and growth.

Again, per my point #1, a problem hasn't actually been established yet. Further, it also hasn't been established that packages can't interoperate.

I am thinking about the future. Overly specific standards hinder development. Wes McKinney, when giving early presentations on pandas, stated he had people telling him not to make pandas since numpy already exists. If he was in a group that specified people use numpy then Pandas wouldn't exist.

How can anyone claim that astropy time is the way to go when requirements (point #2) have yet to be established?

This PHEP needs to start back over at zero.

Luckily, @nabobalis did do a very recent re-write of the PHEP that changes things from "you must use astropy's time" to the following:

This PHEP recommends that all projects across the PyHC ecosystem use the standard library datetime module or if the project has more complex requirements, they should use astropy.time.Time instead of creating their own time objects.

Eventually, we want to encourage that all PyHC libraries allow astropy.time.Time as valid time inputs to their libraries. This has some roadblocks currently which will not allow this to happen in the near future, but hopefully with the PyHC community behind this PHEP, we can push for better astropy integration with the broader Scientific Python ecosystem.

Any existing projects that have their own time object are strongly encouraged to replace their custom time objects with astropy.time.Time.

So, no longer are we saying you're require to use astropy time if that doesn't make sense for you. I think this should be sufficient for allowing a couple different options, while discouraging the creation of several new time objects. It also acknowledges the current challenges with integrating astropy more fully.

rstoneback commented 1 month ago

These goals are made significantly easier/more fully realized when packages can interoperate. Thus, the bringing up of finding standards for time, coords, etc. Also, thinking about these kinds of things now can help set up for the future of PyHC, and growth.

Again, per my point #1, a problem hasn't actually been established yet. Further, it also hasn't been established that packages can't interoperate. I am thinking about the future. Overly specific standards hinder development. Wes McKinney, when giving early presentations on pandas, stated he had people telling him not to make pandas since numpy already exists. If he was in a group that specified people use numpy then Pandas wouldn't exist. How can anyone claim that astropy time is the way to go when requirements (point #2) have yet to be established? This PHEP needs to start back over at zero.

Luckily, @nabobalis did do a very recent re-write of the PHEP that changes things from "you must use astropy's time" to the following:

This PHEP recommends that all projects across the PyHC ecosystem use the standard library datetime module or if the project has more complex requirements, they should use astropy.time.Time instead of creating their own time objects. Eventually, we want to encourage that all PyHC libraries allow astropy.time.Time as valid time inputs to their libraries. This has some roadblocks currently which will not allow this to happen in the near future, but hopefully with the PyHC community behind this PHEP, we can push for better astropy integration with the broader Scientific Python ecosystem. Any existing projects that have their own time object are strongly encouraged to replace their custom time objects with astropy.time.Time.

So, no longer are we saying you're require to use astropy time if that doesn't make sense for you. I think this should be sufficient for allowing a couple different options, while discouraging the creation of several new time objects. It also acknowledges the current challenges with integrating astropy more fully.

The PHEP has failed to demonstrate that astropy.time satisfies the requirements for any PyHC packages. Thus why it is recommended? Plus, the 'strongly recommended' language is sufficient to ensure that no package trying to create time support that meets different requirements will be able to get funding.

Why should we discourage the creation of new time objects if we don't know the existing ones are appropriate? Further, the community has yet to demonstrate why there should be only one of a software package. This is especially relevant for a NASA community as NASA always produces more than one. And again, it has not been established that astropy satisfies PyHC requirements.

The response doesn't actually address the contents of my arguments.

Requirements are required before making decisions
Overly restrictive standards prevent progress
A problem that needs to be solved hasn't actually been established

Ignoring requirements when creating standards is certain to produce standards that don't meet community needs.

eelcodoornbos commented 1 month ago

Just a note that with the current wording:

... use the standard library datetime module or if the project has more complex requirements, they should use astropy.time.Time ...

all packages that accept or return pandas or xarray objects with time-based information (e.g. most models, time series observations, etc.) will simply not be able to comply, since these objects will contain pd.Timestamp and/or np.datetime64 objects.

I think this is a good illustration of @rstoneback's point regarding the need for agreed-upon requirements as a starting point.

nabobalis commented 1 month ago

Why should we discourage the creation of new time objects if we don't know the existing ones are appropriate? Further, the community has yet to demonstrate why there should be only one of a software package. This is especially relevant for a NASA community as NASA always produces more than one. And again, it has not been established that astropy satisfies PyHC requirements.

If someone is motivated to create a new general scientific time package, that should not come under PyHC nor be funded by any NASA Heliophysics B.X call.

all packages that accept or return pandas or xarray objects with time-based information (e.g. most models, time series observations, etc.) will simply not be able to comply, since these objects will contain pd.Timestamp and/or np.datetime64 objects.

I think this is a good illustration of @rstoneback's point regarding the need for agreed-upon requirements as a starting point.

This is why now the PHEP says if you can use datetime, you do not need to change anything since it will work for pandas and xarray.

Fundamentally we have two separate ecosystems, one that is able to use datetime and integrate with pandas and xarray, the other is based around astropy time. The requirements needed for sunpy are only satisfied by astropy time, this PHEP is aimed at crossing this gap in a way that brings the PyHC community onboard so we can move forward together.

eelcodoornbos commented 1 month ago

This is why now the PHEP says if you can use datetime, you do not need to change anything since it will work for pandas and xarray.

This seems to ignore the point that pandas and numpy have their own built-in custom time objects. Like astropy.Time objects, pd.Timestamp accepts standard library datetime as input on initialisation, and/or can mimic the input of the standard datetime object initialization (among other ways). The pd.Timestamp object has a method to convert to standard library datetime for output. But it stores the time information in its own custom time object, with added functionality (for additional conversion/parsing/formatting options, use in indices, array operations, null/NaT values, etc). Similar (but with less convenience methods) for numpy.

So prescribing only standard library datetime and astropy.Time would, in my reading of this text, disallow passing numpy, Pandas and xArray objects containing their own custom time objects (since these will not be in the prescribed set of datetime/astropy.Time), and thereby severely restrict the PyHC community. It is very clear to me that this is not the intent, but it is, in my view, the result of the way it is currently phrased.

nabobalis commented 1 month ago

This is why now the PHEP says if you can use datetime, you do not need to change anything since it will work for pandas and xarray.

This seems to ignore the point that pandas and numpy have their own built-in custom time objects. Like astropy.Time objects, pd.Timestamp accepts standard library datetime as input on initialisation, and/or can mimic the input of the standard datetime object initialization (among other ways). The pd.Timestamp object has a method to convert to standard library datetime for output. But it stores the time information in its own custom time object, with added functionality (for additional conversion/parsing/formatting options, use in indices, array operations, null/NaT values, etc). Similar (but with less convenience methods) for numpy.

So prescribing only standard library datetime and astropy.Time would, in my reading of this text, disallow passing numpy, Pandas and xArray objects containing their own custom time objects (since these will not be in the prescribed set of datetime/astropy.Time), and thereby severely restrict the PyHC community. It is very clear to me that this is not the intent, but it is, in my view, the result of the way it is currently phrased.

Yes, you are correct that I don't mention the pandas objects. I had assumed that would be implicit under the datetime recommendation as I consider them one and the same even if they are not. I am not suggesting the non-use of these objects if you use pandas/xarray, I will have to make that clearer in the PHEP.

I would be very interested in the direct use of numpy datetime64 in PyHC libraries. That was not one I had expected originally.

jibarnum commented 1 month ago

Again, per my point #1, a problem hasn't actually been established yet. Further, it also hasn't been established that packages can't interoperate. I am thinking about the future. Overly specific standards hinder development. Wes McKinney, when giving early presentations on pandas, stated he had people telling him not to make pandas since numpy already exists. If he was in a group that specified people use numpy then Pandas wouldn't exist. How can anyone claim that astropy time is the way to go when requirements (point #2) have yet to be established?

I'm thinking towards the future as well. In my mind, should there be a ton of different time objects created, that could lead to issues with interoperability and duplication of effort. So it made sense to me to set some guidelines, and get people working on the same page early on. Although not all use it, astropy was one of the widely-used packages in PyHC (enough to end up in PyHC summer schools), hence a focus on it. But, it's clear that that's not sufficient in and of itself, which led to @nabobalis revising the PHEP. Should a couple other packages make sense to include (e.g. the discussion happening above re numpy/pandas time objects), then we can all certainly discuss and Nabil get them included in the PHEP.

The PHEP has failed to demonstrate that astropy.time satisfies the requirements for any PyHC packages. Thus why it is recommended?

Several PyHC packages use astropy.time now, not sure what you mean by it doesn't satisfy the requirement of any PyHC package. Can you elaborate?

Plus, the 'strongly recommended' language is sufficient to ensure that no package trying to create time support that meets different requirements will be able to get funding. Why should we discourage the creation of new time objects if we don't know the existing ones are appropriate?

Hmmmm. My feelings on this are that we should, where we can, try to make current systems work better if they aren't currently meeting the needs of the community. I.e., are there ways we can make the currently-existing software work better (e.g. astropy, datetime), rather than spinning up a new tool? And if there isn't a good way to modify what exists now, I think that would warrant a community discussion around that before proposing for funding to do something totally different. If that discussion showed we truly needed something different, then we could pivot there and create PHEPs to supersede current ideas.

Further, the community has yet to demonstrate why there should be only one of a software package. This is especially relevant for a NASA community as NASA always produces more than one.

Well, with the way the PHEP is now, it's no longer the intention to only use astropy.time. If it were sufficiently modified in the future to meet everyone's needs, that'd be a different story, but we aren't at that point.

rweigel commented 1 month ago

I'd like to see this rstoneback's point #1 addressed:

First, we haven't even established there is a problem. SpacePy is the only PyHC package to create time support. Other packages use datetime, which is actually fine. In practice, there are only a few leap seconds per year. So not supporting a leap seconds affects about 6E-8 of the total samples in a year.

The fundamental issue with this PHEP is that it proposes a solution, but the problem it addresses is not concrete or specific. Maybe try framing it as "People do A (with code examples), and this creates problem B," but if they follow the standard, problem B is obviated or ameliorated because of reasons x, y, and z." If the problem is duplication, provide links to lines of code.

We are debating about an unknown, unagreed-upon, or hypothetical problem. I motion that this PHEP discussion is tabled until the concrete-and-specific problem question is answered and there are specific examples of interoperability issues related to time that this PHEP would address. The next step is to get agreement that a claimed problem is a problem. Finally, we should debate proposed solutions. We are not making progress because we started with the last step and the claimed problem is vague.

My experience with PyHC packages is that time interoperability could be improved without an AstroPy mandate. (Unfortunately, I have not had time to document them and provide research that justifies my claim.) I also think it is important that we all attempt experiments with many PyHC packages and develop our own opinions about interoperability issues.

rweigel commented 1 month ago

Another reason for tabling is that continuing debates when it is clear there is little agreement can harm future collaboration or reduce participation. Let's find and start with points of agreement and then debate implementation issues.

jibarnum commented 1 month ago

Another reason for tabling is that continuing debates when it is clear there is little agreement can harm future collaboration or reduce participation. Let's find and start with points of agreement and then debate implementation issues.

Let’s discuss at Monday’s tag up!

jvandegriff commented 1 month ago

I've been behind on this, but am trying to catch up.

This proposal to switch to a new, standardize approach to time represents a very low-level change to most if not all of the libraries in PyHC. The fundamental ways that astronomers and Heliophysicists think about time is different. This adds a large new python library dependency to basically all PyHC projects. Some of the core projects are not supportive of this large change.

With this much headwind, there is the potential for frustration and discord if this gets passed through quickly. Some people have pointed out that there is a lack of clarity on the exact problems that this is solving. We at least need to back up and address those concerns. The general principles of reducing duplication, leveraging existing codebases, etc, are of course well accepted. However, because there are so many existing PyHC libraries that are now in the maintenance-level-only for funding, it is possibly a very scary situation to suggest that they now all try to find funding to adapt to a new approach, or be considered deprecated. So a plan to adapt time formats across existing libraries, or a plan to consolidate over time to a set of accepted standards seems like it could get more traction with a larger number of people.

rebeccaringuette commented 1 month ago

Some very good points here. I suggest that the time currently scheduled in the fall meeting for voting on this PHEP be reassigned to be a hackathon where the group works out what problems we currently face in interoperability between our softwares that this PHEP would resolve.

rebeccaringuette commented 1 month ago

It seems we should also add some text about the time methods being interoperable (rather than equal to) the options in this PHEP, which seem to be astropy.time, np.datetime, and one or two others.

sapols commented 1 month ago

I'll quickly note that we just decided at the recent PyHC core tag-up to not hold a vote on this PHEP at the Fall meeting, and we'll use that time to discuss things as a community instead.

nabobalis commented 2 days ago

It was decided during the Fall 2024 meeting that around tiering of interoperability, the base tier will be that all PyHC packages are installable in the same environment and work without issues. Whereas the the top tier will be that PyHC will attempt convertor functions between libraries.

Therefore, I have closed this PR as it now serves no purpose.

rebeccaringuette commented 1 day ago

https://youtu.be/9FzCWLOHUes?feature=shared could not resist…