scipy / scipy-articles

Publications about Scipy
Other
28 stars 35 forks source link

Scope and structure of SciPy 1.0 paper #9

Closed rgommers closed 5 years ago

rgommers commented 6 years ago

EDIT: tentative structure and authors:

Original description

We need to decide on the topics/structure/scope of this paper.

My suggested list was:

Comment by @pv: I think the harder questions then is the story the paper wants to tell. The challenge here is I think that "Scipy 1.0" consists of bits and pieces, and unlike more field-specific software projects the common theme may be more difficult to state. There's also the previous "Scipy paper", and presumably we want to write a "followup" or "update"?

One possible story could then to be (which also seems what was already suggested) write about what went on in the recent years with the project. In this case the focus would be on the new stuff, and we'd just state what existed before that in the introduction or short background section, and then the body of the paper would proceed on particular things (e.g. as in the list of topics Ralf gave) that we want to talk about.

rgommers commented 6 years ago

Note that there is no previous paper on SciPy, this will be the first journal article ever. I think we should still focus on the last couple of years, but we also need to sketch the big picture and whole history.

charris commented 6 years ago

Might want to begin by describing the intent of scipy and where it fits in the python scientific stack. A short description of the stack as a whole with an illustration showing links to other projects in the stack might be informative. I've seen several such illustrations floating about, don't recall where.

pv commented 6 years ago

I was thinking about these https://scipy.org/citing.html#scientific-computing-in-python above, but indeed they're more general overview articles.

charris commented 6 years ago

And given that SciPy's role was partly determined by history, maybe a short history is appropriate. I believe that SciPy was originally a supplement to Numeric, and NumPy started there as a Numeric replacement and was later split off for independent development.

insertinterestingnamehere commented 6 years ago

If we're interested in sections about the Cython BLAS/LAPACK stuff I can help there. We may be able to draw on some of the material I put in to the SciPy 2015 proceedings paper on that.

rgommers commented 6 years ago

Okay, here's a proposed structure (sub-bullets can be subsections or topics dealt with in a couple of sentences):

charris commented 6 years ago

The "Key technical issues" section looks a bit out of place, although if SciPy is presented as a work in progress it has a place. The outline LGTM otherwise, but that is going to be a long paper :-)

rgommers commented 6 years ago

The "Key technical issues" section looks a bit out of place,

Yeah, you're probably right. I think it's important to mention, but maybe it can go at the end under "future development" rather than in its own section.

ev-br commented 6 years ago

+1 for the outline.

Two more things I think are worth adding under the Recent improvements rubric:

I was going to suggest that Key technical issues can be merged with future developments, and I'm glad that was said already :-)

ev-br commented 6 years ago

Also I think it's most important now to get the ball rolling. To this end I suggest that we adopt the brainstorming type of attitude: we work on sections and try to postpone reviewing and criticisms to the stage where there is at least a rough draft (using comments if needed, basically what Pauli was saying).

And I suggest we postpone worrying about the size of the paper: if it does end up being too large, we can move some parts to Supplementary material or appendices or something. If it is still too large, the corresponding author decides what to cut off :-).

And certainly, SciPy is work in progress, so it should be presented as such IMO :-)

rgommers commented 6 years ago

interpolate: esp polynomial interpolators, PPoly and friends. [EDIT: and I volunteer to write this]

+1

special saw a lot of work recently

Not sure about this one. Excluding cython_special (which was on the list already), I'm not sure if there's a story - they're many hundreds of small unrelated functions, many of which received bug fixes and accuracy improvement. Did you have something in particular in mind?

rgommers commented 6 years ago

Also I think it's most important now to get the ball rolling.

Agreed. I'll move the structure to the issue description, where people can add their names.

rgommers commented 6 years ago

I'll ping some people that either already volunteered earlier, or may be willing to write a section because they're the experts. Please say yes/no, or just add yourself to the issue by editing the description.

Anyone that I forgot, please feel free to jump in where you'd like!

I suggest aiming for 1 page (small font / 2-column) per topic for each the technical features, and 1 - 1.5 pages for each of the other sections.

antonior92 commented 6 years ago

Thank you for pinning me up Ralf. Yes, I would like to help.

I just included my name in the issue description.

tylerjereddy commented 6 years ago

What about spatial :/ Maybe I can squeeze in a mention of SphericalVoronoi, which is relatively new in project lifetime terms. More broadly, we're not just wrapping stuff from Qhull anymore, but also starting to implement our own computational geometry algorithms (albeit slowly).

Benchmark suite might also be a useful place for me to contribute a bit, since I've worked on asv itself a little & added a bunch of spatial benchmarks.

rgommers commented 6 years ago

What about spatial :/ Maybe I can squeeze in a mention of SphericalVoronoi, which is relatively new in project lifetime terms. More broadly, we're not just wrapping stuff from Qhull anymore, but also starting to implement our own computational geometry algorithms (albeit slowly).

Hmm. On the one hand it seems a bit small as a technical focus topic compared to the other ones. On the other hand, yes would be nice to mention it. Maybe we should have one section, or a table, with key enhancements that we can't discuss at length but mention briefly (functionality + why it's important)?

The same could be said of interpolate.PPoly perhaps.

Benchmark suite might also be a useful place for me to contribute a bit, since I've worked on asv itself a little & added a bunch of spatial benchmarks.

Cool, adding you there.

rc commented 6 years ago

I am wiling to help with the sparse section, although I am not too familiar with the current code base - I am still using it heavily, though. The section should cover both the sparse matrices and the related solvers, right?

perimosocordiae commented 6 years ago

I'm also willing to write about scipy.sparse (and potentially scipy.sparse.csgraph), though I don't know much about the solvers.

rgommers commented 6 years ago

I'm not sure about including the solvers. Over the last few years there were mainly small improvements, no major changes IIRC. Performance and feaure improvements in the data structures is more significant imho. And then it can be mentioned there are many users of those data structures - sparse.linalg, sparse.csgraph, scikit-learn, etc.

mdhaber commented 6 years ago

@antonior92 I'll help with optimize. I am quite busy for the rest of the week, but I will make time Sunday (if need be) or early next week.

mdhaber commented 6 years ago

@rgommers I don't have first-hand knowledge of much other than my little corner of optimize, but if there is some topic that needs a writer and nobody with direct knowledge steps forward, let me know.

rainwoodman commented 6 years ago

Yes. I can certainly add the text for the cKDTree, and help out as the rolling starts.

rgommers commented 6 years ago

@rgommers I don't have first-hand knowledge of much other than my little corner of optimize, but if there is some topic that needs a writer and nobody with direct knowledge steps forward, let me know.

Thanks Matt. I'd say for now let's start the sections we have writers for. The biggest chunk is the Recent technical improvements section, we should have writers for all of those.

tylerjereddy commented 6 years ago

For the benchmark suite -- should we aim to demonstrate (plot?) a commit hash history performance improvement for some particular function that has improved a lot (is there one that comes to mind?).

rgommers commented 6 years ago

For the benchmark suite -- should we aim to demonstrate (plot?) a commit hash history performance improvement for some particular function that has improved a lot (is there one that comes to mind?).

I think so. optimize benchmarks for adding new scalar optimization methods, or cKDTree performance come to mind.

stsievert commented 6 years ago

improvement for some particular function that has improved a lot (is there one that comes to mind?).

Two other functions that have seen recent performance improvements are convolve and correlate, both of which automatically choose the faster of the FFT and direct convolution methods as of 0.19. I can help with this if you'd like.

WarrenWeckesser commented 6 years ago

@rgommers wrote:

Note that there is no previous paper on SciPy, this will be the first journal article ever. I think we should still focus on the last couple of years, but we also need to sketch the big picture and whole history.

Given that, should we have a short summary of the primary scope and capabilities of each subpackage? Something like the summary at https://docs.scipy.org/doc/scipy/reference/tutorial/general.html#scipy-organization, but with more information in the description of each subpackage. E.g.

The description wouldn't necessarily be bulleted like that, but those two examples show the level of detail I am thinking of.

What do you think?

rgommers commented 6 years ago

@WarrenWeckesser that sounds good to me!

rgommers commented 6 years ago

@stsievert that may be useful, especially given that @tylerjereddy found speedups that weren't as spectacular as we'd expected in gh-15.

rgommers commented 6 years ago

@WarrenWeckesser as one of the longest serving core devs, maybe I could tempt you to tackle one of the sections other than "recent technical improvements"? Maybe the short overview you just proposal as part of the introduction?

WarrenWeckesser commented 6 years ago

@rgommers I'll work on the brief summaries of the subpackages that I proposed above.

On the mailing list, you suggested aiming for a first draft of the paper by mid-April. The imminent branching of 1.1 will probably keep some of the core devs busy for the next couple weeks, so that date seems a bit optimistic.

Antonio, Matt and Tyler are making progress on their sections, but there is not much activity in the other areas. If you haven't already, I think it is time to start nagging, prodding and cajoling the rest of the volunteers to start writing now, if we expect to have something finished by, say, the SciPy conference (early July). And because being the nagger, prodder and cajoler can be a thankless task, I'll say it now: thanks for organizing this!

rgommers commented 6 years ago

You're right Warren - we need to start speeding up this process. Most of the decisions are made, it's a matter of producing content now. Then there's be a review/rework and "sign up authors" phase. I'll do some poking.

hameerabbasi commented 6 years ago

Hello, I'm definitely willing to contribute. I was cc'd in an earlier version, but was apparently removed from the current one. I don't know if I still have the chance?

tylerjereddy commented 6 years ago

@hameerabbasi Yes, I think some content related to scipy.sparse would be welcome. I think @perimosocordiae is pretty swamped, but perhaps he could give a quick read-over of whatever gets drafted. It is a lot easier to move forward if we have even a rough outline of the section rather than starting from nothing.

rgommers commented 6 years ago

Okay time for a status update here. I'll be off-grid for 3 weeks or so, so won't be very responsive. Pinging a few people for help / next steps.

We're nearly there, let's see if we can finalize the manuscript by the end of August and submit in the second half of September.

aarchiba commented 6 years ago

I know it's late to restructure the text substantially, but one of the key things I would want from a paper like this is: "what can scipy do?" That information is currently filed under "Architecture and Implementation Choices", which isn't where I would have looked for it. Maybe just break that section out to something like "what scipy contains"? In its current place could go a few brief general comments about module/submodule arrangement, like whether "import os" also gets you "os.path.*" or whether there's intended to be any point in a plain "import scipy". Of course for full details people can be referred to the docs.

rgommers commented 6 years ago

I know it's late to restructure the text substantially, but one of the key things I would want from a paper like this is: "what can scipy do?"

Yes I agree. I'd expect to read some of that in the introduction already.

hameerabbasi commented 6 years ago

I can review the paper in its current state within the coming week and possibly do some of the stuff written in the checklist.

antonior92 commented 6 years ago

Hi all, It seems that we are very close to finish the initial version of the manuscript (only introduction, discussion and future development missing) but it seems that we have not made much progress along the last month. @jarrodmillman, @stefanv, @tylerjereddy, @rgommers there are anyway I could help?

stefanv commented 6 years ago

Actually, we are pretty close with the introduction. I need to give it one more pass, but then it should be ready for wider review.

mdhaber commented 5 years ago

@stefanv I will have a few weeks window for SciPy work soon. Any chance you've finished that final pass?

stefanv commented 5 years ago

Excellent news; let me see if I can fit that in tomorrow.

stefanv commented 5 years ago

OK, I'm done.

mdhaber commented 5 years ago

Thank you!

jarrodmillman commented 5 years ago

@rgommers, @tylerjereddy What are our next steps?

tylerjereddy commented 5 years ago

I think the plan is to go through & fix up the paper a bit so that it has a "consistent voice" after so many contributions from different people.

Assuming we're still targeting Scientific Reports I guess I / we should check for missing sections they normally require, etc., though we're still not at the stage where we have a complete authors list.

stefanv commented 5 years ago

On Thu, 10 Jan 2019 11:38:46 -0800, Tyler Reddy wrote:

Assuming we're still targeting Scientific Reports I guess I / we should check for missing sections they normally require, etc., though we're still not at the stage where we have a complete authors list.

Would you mind checking their website, and if the information is not there reach out to them?

I think we can get the authors list sorted out pretty quickly when it is needed.

tylerjereddy commented 5 years ago

Yeah, we have an Editorial contact (Syma) we can ask if we don't find the information, but should be easy to find.

rgommers commented 5 years ago

I think we can get the authors list sorted out pretty quickly when it is needed.

Agreed. I think 7-10 days or so is reasonable. AstroPy gave people 2-3 days IIIRC, I think that was definitely too short and that paper probably ended up with fewer authors than wanted to be on it.

@mdhaber has everything lined up to sort out the author info.

I'd suggest a reasonable list of next actions would be:

  1. One complete pass to streamline the content/voice.
  2. Ping authors
  3. Final tweaks
  4. Submission
stefanv commented 5 years ago

Ralf, would you like to be the "single voice", or would you prefer one of us to take care of it?