Define a theme for the paper

We've had some discussion at PR #44 which I will move to this new thread so we can merge the draft introduction.

moorepants:

This makes me want to step back and talk about what the "theme" of the paper is. Being that we are submitting this as an academic publication, it would be worth thinking about what the academic contribution of SymPy is. If the paper is strictly a description of a particular software, I'm not sure typical journals will be that fond of accepting it (have we chosen a journal?). The intro can set the tone for the theme. Another good question I like to ask when writing papers is "What story do we want to tell?".

It isn't yet clear to me what story we are telling. It would be nice if we can show how SymPy solves particular problems better than other CASs, or is more suitable for scientific work, or that community driven CAS development creates software that better suites scientific computing needs.

Have we had any conversation about this yet? I may have missed it.

This intro could help set the tone of all the other sections.

@asmeurer

Yeah, OK, let's discuss what the introduction should be. I just sat down and wrote what I felt should go here, but I think it could be better.

The point of the paper is to be the paper for SymPy, i.e., this will be the paper that people cite when they use SymPy.

To that end, I feel it should primarily answer two questions: "what is SymPy?" and "why were the various aspects of its design done the way they were?"

So my goal in writing this was to give high level answers to these questions. The rest of the paper will go into more detail. For example: what is SymPy? It's an open source computer algebra system written in Python. Why is it written in Python? I tried to answer that here. What are the core principles of its design, and why? I also attempted to hit this here, with at least one core principle (usability as a library). I may have gone off too much on a tangent with some nuances from choosing Python (it probably should be moved to the architecture section).

What are the other core design principles, which deserve discussion in the introduction? I'm not sure. "Using Python" and "being a library" are two that come to mind, but I may have missed some. Perhaps the goal of being "full featured" deserves some further discussion. Also, this can be inspiration.

I do think that we collectively are not really agreed on the theme of the paper, which is clear from the conflicting desires of the various pull requests. So what do others think the theme should be? Do you agree with my assessment above?

@moorepants

Aaron, I like what you say above:

Should be the paper for SymPy.

Should answer "What is SymPy?"

Should answer "Why were the various aspects of design done the way they were?"

I'd add these:

Should answer "Why is SymPy useful to researchers, educators, and companies?"

Should answer "How does SymPy compare to other similar software and why should you use it instead of or alongside them?"

I think these two are a bit more important wrt to the sympy paper than the design decisions. The design decisions are more interesting to designers of CAS systems and less so for users.

One question to ask ourselves:

Who is the intended audience?

Finally, I was thinking that we could write the abstract first. If we can agree on an abstract it will frame the "mission" of the paper. Then when each person writes their section, they can refer to the mission to judge whether what they are writing fits the theme.

@aktech

@moorepants I cannot agree with you more, here. +1

@ashutoshsaboo

Hi,

I somewhat agree with @moorepants . It'll be very useful, if we mention the way SymPy helps educators, researchers, and companies, as Mr. Moore suggested. It would also provide the targeted audience a reason, as to why to use SymPy if we write about the above. So, I guess, the addition of the 2 points that Mr. Moore suggested above, would make our introduction, a pretty good one @asmeurer .

In that way, we could also easily build up for explaining all the full, extensive features list of SymPy as well.

But, I guess we must also not miss upon the important architecture details of SymPy as a CAS, and I feel that, they must only be mentioned in the Architecture section, and not the Introduction. @asmeurer

I think that we need to draw out the novel contributions of SymPy. CASs have been done, F/OSS has been done, Python has been used in big projects, Open development has been done. However, SymPy is a novel combination of this. I think the really new thing is that it allows the CAS to respond to the needs of the users without the knowledge or consent of the original developers. I am not sure that there is another major CAS that you can get features into (big or small) without a charge code. This is important. Framing the paper as a series of user-driven contributions in this specialized subject seems like it would be of value. Also, we are writing the paper now because the project has achieved some modicum of stability given the v1.0. It might be relevant to provide a brief history or timeline, if this were to be adopted as the focus.

I am not sure that there is another major CAS that you can get features into (big or small) without a charge code.

Maxima

Oh right. But Maxima wasn't always open source.

Besides Maxima, there is also Sage and Axiom (+ forks like FriCAS). I think these 3 are all the major open source CASes. Of these, Sage has lots of innovations (new code) in number theory and other math stuff, but for the symbolics (SymPy domain), they just mostly use Maxima and SymPy (and GiNaC, which is a lot more limited in terms of features --- but it's fast, that's the motivation behind SymEngine, that should be now faster than GiNaC, at least what we've seen so far). E.g. the limit algorithm, integration algorithms and so on, those are unique in SymPy and Maxima (and Axiom).

The problem with Axiom is that it is old and hard to extend (written in a combination of lisp and their own language). Similarly with Maxima --- it has some good algorithms, but it's hard to extend/fix due to the language choice. SymPy is written in a mainstream language. All this is evidenced by the size of the community, e.g.:

$ git clone http://git.code.sf.net/p/maxima/code maxima-code
$ cd maxima-code
$ git shortlog -ns --since="1 year ago"
   183  Robert Dodier
   164  Gunter Königsmann
    94  Kris Katterjohn
    92  Wolfgang Dautermann
    82  Raymond Toy
    77  Volker van Nek
    43  Mario Rodriguez
    43  Rupert Swarbrick
    15  Andrej Vodopivec
    14  Viktor T. Toth
    10  Dan Gildea
    10  Jaime Villate
     9  David Scherfgen
     9  Sergey Litvinov
     7  Leo Butler
     6  David Billinghurst
     2  Barton Willis
     2  Ingo Feinerer
     2  Litvinov Sergey
     1  Yasuaki Honda

v.s.

$ git clone https://github.com/sympy/sympy
$ cd sympy
$ git shortlog -ns --since="1 year ago"
   319  Aaron Meurer
   259  Chris Smith
   234  AMiT Kumar
   184  Sartaj Singh
   137  Gaurav Dhingra
   135  Sudhanshu Mishra
   111  Shivam Vats
   108  Kalevi Suominen
    90  Jason Moore
    90  Ondřej Čertík
    76  Harsh Gupta
    61  Francesco Bonazzi
    55  Colin B. Macdonald
    54  Juha Remes
    39  Björn Dahlgren
    36  Thomas Baruchel
    35  Jim Crist
    35  Tanu Hari Dixit
    33  Meghana Madhyastha
    31  Dustin Gadal
    29  Jatin Yadav
    28  Keval Shah
    27  Arafat Dad Khan
    26  Thomas Hisch
    24  Alkiviadis G. Akritas
    23  Dzhelil Rufat
    23  Jason Siefken
    21  Kshitij Saraogi
    20  Ralf Stephan
    19  Aravind Reddy
    19  Mark Dewing
    19  YiDing Jiang
    17  Ashutosh Saboo
    16  Pablo Zubieta
    14  Anish Shah
    14  Kyle McDaniel
    14  Rishabh Daal
    14  Shubham Tibra
    13  Sachin Joglekar
    12  Harshil Goel
    12  Joachim Durchholz
    11  Akshay Siramdas
    11  Chaitanya Sai Alaparthi
    11  Mario Pernici
     9  Curious72
     9  Sahil Shekhawat
     9  Sampad Kumar Saha
     9  Sean Vig
     8  Moo VI
     7  Abhishek Verma
     7  Alex Argunov
     7  Chai Wah Wu
     7  David T
     7  Devyani Kota
     7  Shekhar Prasad Rajak
     6  Eva Charlotte Mayer
     6  Kumar Krishna Agrawal
     6  Nitin Chaudhary
     5  Adam Bloomston
     5  Bhautik Mavani
     5  Isuru Fernando
     5  Matthew Thomas
     5  Timothy Reluga
     4  Akshay Nagar
     4  Alexander Bentkamp
     4  Aman Deep
     4  Archit Verma
     4  Boris Atamanovskiy
     4  Chak-Pong Chung
     4  Haruki Moriguchi
     4  Jai Luthra
     4  James Brandon Milam
     4  Longqi Wang
     4  Michał Radwański
     4  Min Ragan-Kelley
     4  Nguyen Truong Duy
     4  Oliver Lee
     4  Peter Brady
     4  Richard Otis
     4  Sanya Khurana
     4  Tom Gijselinck
     3  Abhinav Agarwal
     3  Akash Trehan
     3  Jiaxing Liang
     3  Mathew Chong
     3  Matthew Parnell
     3  Tschijnmo TSCHAU
     3  operte
     2  Abhishek Garg
     2  Alex Lindsay
     2  Anton Akhmerov
     2  Juan Felipe Osorio
     2  Justin Blythe
     2  Kevin Ventullo
     2  Matthew Davis
     2  Michael Mueller
     2  Michael S. Hansen
     2  Peleg Michaeli
     2  Phil Ruffwind
     2  Rehas Sachdeva
     2  Sergey B Kirpichev
     2  Shivam Tyagi
     2  Vladimir Poluhsin
     2  Yu Kobayashi
     2  hm
     1  Aaditya Nair
     1  Aqnouch Mohammed
     1  GolimarOurHero
     1  Guillaume Jacquenot
     1  Guo Xingjian
     1  Jack Kemp
     1  Jacob Garber
     1  Jens Jørgen Mortensen
     1  Jerry Li
     1  Matthew Brett
     1  Michael Zingale
     1  Nathan Musoke
     1  Nicolás Guarín-Zapata
     1  Nishant Nikhil
     1  Oscar Benjamin
     1  Pastafarianist
     1  Prabhjot Singh
     1  Prashant Tyagi
     1  Rich LaSota
     1  Ruslan Pisarev
     1  Sam Tygier
     1  Sandeep Veethu
     1  Shashank Kumar
     1  Sourav Singh
     1  Srajan Garg
     1  Thomas Hickman
     1  Timothy Cyrus
     1  Vasiliy Dommes
     1  Vinay
     1  Yury G. Kudryashov

And I am sure we could probably plot some graphs etc., but I think it's clear that more people contribute to SymPy than to Maxima.

Some thoughts:

I think the paper should focus on the architecture of SymPy, and also what (high level) design aspects of SymPy make it unique.
To the question of "Why is SymPy useful to researchers, educators, and companies?", I don't really feel very well qualified to answer this. Very little of the existing content for the paper addresses this. Most of the use-cases I know of revolve around code generation (which we already agreed should be a separate paper).
To the question "How does SymPy compare to other similar software and why should you use it instead of or alongside them?", I see two sides to this. One is how SymPy compares to other computer algebra systems. An issue here is that we don't really have expertise in other systems (other than Mathematica, and probably Sage).

The other part of this is how SymPy fits into the rest of the SciPy ecosystem. I think we already have touched on this a bit, although maybe it should be expanded.

sympy / sympy-paper

Define a theme for the paper #55