sunpy / sunpy-1.0-paper

The SunPy 1.0 Paper Repo
12 stars 21 forks source link

Add a 'Why' Section motivating the need for SunPy in the first place #117

Closed mbobra closed 5 years ago

mbobra commented 5 years ago

@nabobalis suggested we add a 'Why' Section motivating the need for SunPy in the first place. We all agreed during our 17 July meeting that this is a good idea.

Let's collect ideas in this issue rather than assigning this section to any given person.

ehsteve commented 5 years ago

Free Python makes it much easier for wider participation by members outside of the traditional solar community which likely do not have an IDL license.

wafels commented 5 years ago

Uses modern software development methodologies to improve code reliability Take advantage of much wider range of data analysis tools in Python Allows for a wider contributor base because the code is free

ehsteve commented 5 years ago

Provides access to world-class functionality provided by the Python ecosystem (e.g. machine learning, website code, etc.)

ehsteve commented 5 years ago

Because Python is so widely supported provides access to world-class (and often free) infrastructure for builds.

ehsteve commented 5 years ago

PEP8 is a feature of Python which leads to more readable and maintainable code.

ehsteve commented 5 years ago

Astropy is only available in Python!

ehsteve commented 5 years ago

Lots of support/software available for developers like IDEs, stack overflow.

ehsteve commented 5 years ago

The community is much more organized and therefore supportive like SciPy conference, pyastro conference. No such thing for IDL.

ehsteve commented 5 years ago

The Python language is more actively evolving to fill current needs.

ehsteve commented 5 years ago

Better integration with other languages such as C or FORTRAN code. This provides an avenue to make functionality more efficient if needed.

hayesla commented 5 years ago

A more general need (less of why Python but more of why SunPy)

solar physics in particular is data heavy - more observations that many other fields by a whole suite of instruments. Developed software is required to perform data analysis and achieve advances in science from these observations

ehsteve commented 5 years ago

A community owned project like Python provides more stability for the long-term compared to one owned by a company that might sell it or close it down.

ehsteve commented 5 years ago

Python is now being used to teach computer programming so students have already learned it which enables a faster ramp to productive research.

ehsteve commented 5 years ago

Python experience is a much more marketable skill that students can use to get good jobs if they need to leave solar physics.

hayesla commented 5 years ago

easy way to convert complex data formats into more generic data outputs - csv, hdf5, binary etc

ehsteve commented 5 years ago

Package structure of Python makes it easy to add functionality as well as build on top of others. Packaging systems like conda do a great job at maintaining and upgrading that system.

bsipocz commented 5 years ago

Non black box tools are better to ensure science results are reproducible 🤷‍♀

bsipocz commented 5 years ago

Not sure whether talks can be cited, but if yes, Jake's Pycon 2017 keynote would be a good one: https://www.youtube.com/watch?v=ZyjCqQEUa8o https://speakerdeck.com/jakevdp/the-unexpected-effectiveness-of-python-in-science

mbobra commented 5 years ago

Here are a few good resources:

bsipocz commented 5 years ago

@mbobra - wow, these are very good links, thanks!

wtbarnes commented 5 years ago

Python is a more transferable skill for students both into and out of solar physics, i.e. incoming graduate students new to solar physics/astronomy are likely to know Python; for students/postdocs that ultimately leave the field, Python is a more generally transferrable skill set than IDL (The former is arguably a more important point as one could argue we shouldn't make decisions based on what provides the best "job training").

bsipocz commented 5 years ago

You can use the latter as well, in other countries the training aspects are having more emphasis than here (it's definitely an element with e.g. EU grants, afaik also important in Canada, etc.)

wtbarnes commented 5 years ago

Related to the infrastructure comments, the free aspect of Python more easily allows for analysis of "big data" (of which solar physics has quite a bit) because of both existing tooling (e.g. Dask, scikit-learn, Keras) and the ability to spin up many instances of Python across many machines (e.g. on cloud, HPC systems) with no licensing restrictions.

I think this is a prime example of where the field has been substantially held back due to software choice. Notice that all of the recent big data/ML/NN papers on solar data are all using Python.

wtbarnes commented 5 years ago

You can use the latter as well, in other countries the training aspects are having more emphasis than here (it's definitely an element with e.g. EU grants, afaik also important in Canada, etc.)

Fair point. Lots of universities care about this too presumably.

wafels commented 5 years ago

Python is currently the language taught in a lot of universities. That may change - it used to be Pascal and Java. Python is now a job skill, but it might not be in the future (see Perl). Learning good programming habits is more important than the language. Python provides object-oriented and imperative programming styles, but does not provide functional programming naturally.

Permissive free licenses such as those used by the commonly-used packages in the Python science ecosystem removes a barrier when thinking about applications, so that is an advantage.

ehsteve commented 5 years ago

I've tried to boil down all of these great suggestion into the following rough draft

The solar community is relatively small compared to other scientific communities. In order to keep up with increasingly difficult scientific challenges and produce world-class science it must leverage all relevant and available resources. This drove the past transition from FORTRAN to IDL in the 1980s. Two driving factors for this change were the fact that IDL is an interpretive language which allowed for faster (and therefore cheaper) development compared to FORTRAN, a compiled language, and because IDL includes a large number of powerful libraries that support scientific data analysis. In parallel, the astronomy community was going through the same transition which further motivated the solar community with new and complimentary software released as part of the IDL Astronomy User's Library. Comparitively, the transition from IDL to Python will provide a much more significant improvement. The generic strengths of Python have been enumerated in a number of other papers (cite cite cite). Specific to solar are

1] access to students that already have learned Python so no need to train them

2] access to a whole new world-wide community of developers since python is free

3] access to a wide-range of specialized tools

4] access to (free) compute services

mbobra commented 5 years ago

Let's keep this open for now -- sorry I've been late to address this issue :snail:

hayesla commented 5 years ago

sorry yes didn't mean to close it! 🤦‍♂

Cadair commented 5 years ago

Also worth pointing out that the open development model of sunpy and the surrounding ecosystem is a big plus irrespective of the choice of language.