Closed calebweinreb closed 5 months ago
@calebweinreb Thanks, this is fair. The path of least resistance for now would be to de-emphasise what is in the JOSS paper. However, I would like feedback on one other intermediate option which I’ll outline here.
I have previously looked into the merits of a basic conversion approach versus a more integrated approach to the xarray/pandas APIs, but I don’t think I documented my findings. I have therefore included some brief notes further below. The very short summary is that using a basic conversion approach mostly works but leaves too many rough edges and results in a poor user experience.
In addition to the options for JOSS publication you identified, something else that’s doable would be:
I would be interested in your thoughts on whether the following changes would be sufficient for the paper: (a) Create a tutorial notebook and helper conversion function as described above (b) Change “expanding support” to “support for pandas is being introduced” (c) Leave the title as is
It is totally reasonable if that’s not seen as sufficient - it’s important pandas users don’t have a negative experience when looking at the package, and the paper shouldn’t overstate things. However I’d appreciate your feedback on that option. If that’s not sufficient, I would be inclined to then remove pandas from the title.
The following are some notes about the merits of a basic conversion approach versus a more integrated approach to the xarray/pandas APIs:
xarray provides some functionality for transforming both to and from pandas, which would form the basis of the pandas support. However, a few things prevented this from meeting all the requirements. None of these are ‘showstoppers’, but would all need to be carefully tested to ensure a quality result.
Examples include: (1) The Pandas series objects which make up the columns in a data frame didn’t always have the “name” set to the header of the column. As a result, sometimes parts of the data structure end up with an unexpected name. This has flow-on effects to accessing, labelling and plotting the data. (2) The default dimension names for new data structures differ between xarray and pandas. This can similarly result in unexpected names for things.
I also have some uncertainty about handling pandas data structures which might have duplicate or semi-duplicate entries when converted to an xarray type. This is unlikely to be a showstopper, but will require use case testing, creating useful error messages, and likely some documentation.
Originally, the hope was to have polymorphic methods in the main API (i.e. you can pass either a data frame or an xarray object into any method) however this became a challenge for type hinting and inheritance. Xarray recommend creating “accessor methods” (https://tutorial.xarray.dev/advanced/accessors/01_accessor_examples.html) rather than creating new classes which inherit from xarray types. Some experimentation revealed that inheritance from xarray types is not straightforward.
As a result, it was a cleaner implementation and also clearer for users to provide a separate API for pandas. This would also allow any different treatments that might be wanted for data frames to be handled there.
I spent some time yesterday looking into the intermediate option that I suggested. I think it would still end up being a bit rushed, so I would prefer to de-emphasise the role of pandas in the paper until we have a more fulsome implementation. I'll update the paper accordingly and let you know when the update is ready.
@calebweinreb we have merged a change to the paper into 528-joss-paper-submission which we believe addresses your feedback. Please let us know whether this seems suitable.
The new draft looks great and totally addresses my concern.
Re JOSS review openjournals/joss-reviews#6889
There currently appears to be a mismatch between the level of pandas support implied by the JOSS manuscript and that which is actually available in the scores package. (Apologies in advance if I have overlooked any pandas options already in the package).
I think this could be addressed in two ways.