yhat / rodeo

A data science IDE for Python
Other
3.92k stars 410 forks source link

What is the future of Rodeo? #655

Open ghost opened 6 years ago

ghost commented 6 years ago

This is an issue currently facing by many users of Rodeo, so please dont close this

I have been a user of rodeo for an year or more. Eventhough rodeo has many bugs and problems I stayed with it because of my love for RStudio and belived Rodeo can become the "RStudio" of python. The biggest problem for me in rodeo is its memory and over heating problems and the culprit I think is electron.

For a year there seems to be no update or development happening in this project. Everything is breaking in rodeo ide, forum had been removed and there is problems with the rodeo website itself. As an user of this software, I want to know whether this project is abandoned. Can some one from yhat give some details about rodeo's developmen? What is happening there?

abalter commented 5 years ago

AFAIK Jupyter is basically getting superseded by Jupyterlab. There are Jupyterlab extensions being developed for variable explorer. I think I tried them and the work well enough. Jupyter I think wants to be a full-fledged online IDE eventually, but they are slow getting there.

I respect everything @CAM-Gerlach says about Spyder. I used Spyder for years. There has been nothing like it for doing science with Python, and the team has worked very hard to keep developing it.

That being said, I'm pretty sure Theia is going to be the way to go. For better or for worse, JS based apps that work on any platform from mobile to web server are the way of the future. Let's look at who has been working in this direction:

Adobe: Brackets GitHub: Atom Microsoft: VS Code (which theia is based upon)

Also, Microsoft now owns github, so I predict we will eventually see Atom retired in favor of VS Code.

Again, Spyder and the whole PythonXY ecosystem was just what we needed over the years. But I think the future is going to lie in a different direction.

OldGuyInTheClub commented 5 years ago

I think it was the JupyterLab widget I tried and it didn't do very much. The whole Jupyter/Lab hype cycle has gotten too much for me. I don't know what Theia is. When I learned Python a few years ago it was billed as a Matlab killer for scientific and technical computing. It isn't. Every year the target market changes: Data science, machine learning, AI, now digital transformation, who knows what tomorrow. Each one of these requires different tools and the scientific/technical computing piece of it isn't well served and is unlikely to be given the funding and time required to do it right.

abalter commented 5 years ago

@OldGuyInTheClub I wouldn't frame it as a battle royale.

I don't think I've seen Python billed as a Matlab killer. Also, what makes Matlab is not just the language, but the IDE. It also has Mathworks behind it creating tons of fantastic packages. But Matlab licenses are crazy expensive. Honestly, and arrest me if you will, I pirated it for many years when I used it in research.

Python is open source and WAY better than Octave or SciLab. The syntax can be a bit more heavy with needing to import libraries and then refer to them. However, if you are brave, you can just from mylib import *. With the memory today's computers have, I think it's worth it. Plus, when you code in Python you are contributing to the open source community, which is good karma.

For stats-heavy data science, I've had to admit that R/RStudio lets me be extremely productive. Moreso than I think Python could ever be unless APIs are simplified and functionality increased.

So, it usually just boils down to the right tool for the job, and being able to afford that tool.

OldGuyInTheClub commented 5 years ago

I agree with you on matching tools for the job bu I guess we see/read different things. Many sites/people say that Python supersedes Matlab and for free.

I wholeheartedly agree that the Matlab IDE is excellent. I've been looking for years for something comparable in Python (hence being on this thread) and haven't found anything that comes close to it. And, yes, with Mathworks one gets what one pays for. I have access to Matlab and Simulink at work and am very grateful for them.

Not to quibble too much but import * is strongly not recommended by just about every Python site I've encountered. I don't think programming in Python is automatically a contribution to open source or a virtuous act.

If I were to do statistical work, I'd make the time to learn R. The packages seem to be written by heavyweights in that community. The ggplot approach to data display is also very intriguing.

CAM-Gerlach commented 5 years ago

N.B. I am (at least for now) retired as a Spyder core dev, so I'm (somewhat...) less biased than before, though also less informed on a number of exciting things happening with Spyder lately.

However, if you are brave, you can just from mylib import *

As @OldGuyInTheClub correctly states, this is strongly ill-advised by every authoritative resource and is virtually never used outside of very specific circumstances by Python users who have been educated on the pitfalls involved (or already fallen into them). It takes virtually no more time to use the properly qualified names for things with any decent modern editor that has autocomplete (or assign just the names you're using to shorter forms), while avoiding actively breaking said autocompletion, introspection, docstring retrieval, etc. tools, making it much easier to find instances of specific names being used in your codebase, leading to confusion and ambiguity reading source files, and most importantly, introducing pernicious "namespace hell" issues (common in R and other languages without proper namespaces) that can be devilish to track down.

In short, don't ever use from spam import *; always use import spam, import spam.eggs, or (if you prefer shorter names) the standard abbreviations import numpy as np; import pandas as pd, import matplotlib.pyplot as plt. import os.path as osp, etc. I used to miss the R way in terms of simpler APIs and shorter name/spaces, but now that I'm properly used to Python, I really see the benefits and don't miss the inferno that is R's syntax and semantics so much.

That being said, I'm pretty sure Theia is going to be the way to go.

I just don't understand this suggestion. Its like saying that a Tesla Model 3 is going to be the way to go over a pickup truck for hauling loads, just because the Tesla is newer and more hip to the latest trends., rather than better-suited for the task at hand. Theia is designed to be a lightweight software development IDE; it has none of Spyder's, Rstudio's, Matlab IDE's etc tools designed for data science, and is actively worse than other options like VSCode (on which it is based) or even Pycharm in that regard, which at least include some basic scientific functionality (variable viewers, etc).

Furthermore, as much as people love to hype up "web-based", "JS-based" etc;, I've never been able to get a solid explanation for what advantages that offers for this use case. You're not going to be developing your scientific code on a smartphone, and with the stagnation of Moore's law there is no reason to believe that will change within the next decade. Jupyter Notebook and JupyterLab already highlight some of the key limitations of the web based approach, and most of the advantages (being able to easily run code on remote servers) either already exist in IDEs like Spyder without the same compromises or are planned to be added for Spyder 5.

And, of course, the fact that it's all JS/TS-based rather than written in the same (or any) language actually used in scientific computing (in fact, in a language whose sole virtue and staying power is due to being the only one shipped in web browsers), makes it much less accessible to users wishing to modify, extend (e.g. with plugins) or improve it, and less easy to integrate with the rest of the PyData stack.

The Spyder Reports plugin while still shown on the Spyder website is no longer under development and is undocumented.

Development will hopefully pick up again soon once Spyder 4 is out. I offered to fund it myself, there just wasn't the dev bandwidth.

fkromer commented 5 years ago

@abalter @oldguyintheclub Thanks. I'll give Spyder and Theia a try.

abalter commented 5 years ago

@CAM-Gerlach BTW, this is fun and informative for me, I hope for you too. It's good to hash out ideas in a debate to expose foregone conclusions etc.

Its like saying that a Tesla Model 3 is going to be the way to go over a pickup truck for hauling loads,

I think it's like saying an electric or hybrid pickup is going to be the way to go over an internal combustion engine.

My point is also that the JS apps appear to not just be a fad. They are clearly here to stay. They are also much more amenable to community development since you can add to it using JS which is both interpreted and much more widely known. Also, because JS based apps use web technology, there is just soooooooo much out there in terms of tools and solutions pre-built for adding features.

Finally, a comprehensive data science environment is being developed for VS Studio. This would also work in Theia.

As for the web app aspect, I can only say that there are applications where it is really essential. For instance, I could open up the web app on server A, start a notebook running, open up the web app on server B and start a different notebook running, etc. It's not a use case everyone needs, but if you do need it, a remote connection from a desktop application just won't fit the bill.

Finally, while you may not do development on a cell phone, a person may very well want to check on a job. More likely, they may work on something on a tablet or chromebook. I think 5 years ago this may have seemed like a fad. But given the continued development behind this format, and the big guns behind it, it's probably for us to learn the advantages and create our own use cases rather than doubt that it's useful.

CAM-Gerlach commented 5 years ago

I hope for you too.

Sure. I just genuinely want to understand why a substantial fraction of people seem to see "web-based" or "mobile-ready" as a meaningful advantage for a workhorse data science IDE, as opposed to some end-user "app".

I think it's like saying an electric or hybrid pickup is going to be the way to go over an internal combustion engine.

But my point is that Theia isn't a pickup (or a Tesla Semi), or even close to one. Its a Model 3, great for driving a few people a few hundred km at a time within the supercharger network with cheaper running costs and a much cooler aesthetic, but not at all designed for hauling loads (and does a rather poor job at the same). It may be eventually developed into something resembling one, but even then it only matches the existing feature set of alternatives (Spyder, Rstudio, etC), possibly with some additional limitations (range, superchargers, cost, etc.) inherent to its fundamental operating principles (JS-based/web-ready). Also, unlike climate change and fossil fuel depletion of the analogy, there isn't necessarily some overarching externality that I'm aware of pushing the drive toward a different paradigm for Theia, at least that has been concretely explained to me.

They are clearly here to stay

So long as JS remains entrenched in the browser space, then yes, due to (as I mentioned) the simple fact that, as it is the sole scripting language of the web, hordes of web developers learn just enough of it to be dangerous, and thus want to apply this skillset to other domains, as well as browser developers pour massive effort into optimized JS engines. Without this, there is virtually no motivation to use it to actually build applications, due to its poor design and being demonstratively inferior to many other readily available options that are properly designed for such tasks and widely used for such.

They are also much more amenable to community development since you can add to it using JS which is both interpreted and much more widely known.

As I state above, this is valid for web developers, but simply not true for the scientific/engineering/data analysis community. Essentially no one does their actual data analysis in JS (as opposed to Python, R, Matlab or even C++), and thus very few in said communities are familiar with the language, as opposed to those latter ones, nor is integration with their actual stack nearly as easy for the same reasons. Ergo, my points above. Furthermore, even overall, Python is only modestly behind JS overall and continues to grow relative to it; in terms of traction within the data science community at large, it isn't even close.

Finally, a comprehensive data science environment is being developed for VS Studio.

Nueron looks quite interesting indeed, but it is far from a comprehensive environment as opposed to a tool to generate several different types of interactive output, and there's nothing that couldn't be done just as well in a desktop IDE like Spyder (some of which it already does, and more is planned for Spyder's Viewer plugin in Spyder 5). Furthermore, it makes my very point--as VSCode already offers a number of data science features, what does Theia offer as a platform for data science above and beyond that? This is the part that I just don't understand.

As for the web app aspect, I can only say that there are applications where it is really essential.

Okay, but could you provide a specific, real-world example? Presumably, since you're interested in Theia, you would have one from your own work. Furthermore, these would need to outweigh the fundamental limitations of being a web app in all other contexts where it is not essential.

For instance, I could open up the web app on server A, start a notebook running, open up the web app on server B and start a different notebook running, etc.

Okay, but what does this accomplish? What's the practical purpose being served here? What's the real-world use case?

A remote connection from a desktop application just won't fit the bill.

Why not? What specific things in this scenario can a desktop application not accomplish? With Spyder, for instance, you can automatically start or connect to Jupyter kernels running on many different servers at once, switching between them with ease, and in Spyder 5 the plan is to add remote file editing and manipulation as well as connecting to and interacting with full-on Jupyter notebook servers to do anything you could locally or from a web-based UI. Furthermore, relying on Jupyter notebooks for the backend and the frontend also locks you into all of its limitations and pitfalls, as opposed to working with proper portable, interoperable, re-usable Python modules.

Finally, while you may not do development on a cell phone, a person may very well want to check on a job.

Checking on a job is a radically different use case than working in a full IDE; all that's needed for the former is a means of notifying the user as to its status, which can be done through something as simple as email or a webpage displaying status output. Its a huge stretch to go from this to porting your entire IDE into a cross-platform, mobile-first framework just for something so simple and tailored.

It's probably for us to learn the advantages and create our own use cases rather than doubt that it's useful.

I don't see it as particularly wise to try each and every new whiz-bang workflow idea that comes along without an obvious practical benefit to a mainstream workflow, nor clearly illustrated use cases and applications that offer demonstrated advantages over current methods. This is exactly why I'm asking you, an advocate for some sort of web- or mobile-based data science IDE, to illuminate these very things regarding your proposal, so I may understand why indeed you see it to be such an attractive option.

abalter commented 5 years ago

https://github.com/abalter/theia-data-science-ide/blob/master/README.md

Project Proposal:

Open-Source, Platform Agnostic, IDE Based on the Theia Framework for (Data) Scientific Computing

TL/DR

I propose that "the community" use the Theia framework to build an open-source IDE that combines the best of RStudio, Spyder, and Jupyter (etc.) into a data science IDE that is cloud/desktop agnostic and language agnostic.

Contents

Introduction

As far as I can tell, the Data Science / Science community primarily uses three coding environments:

Each has their strengths and weaknesses, pros and cons, adherents and detractors. I personally avoided R and RStudio for a long time in favor of Python, Spyder, and Jupyter. My most recent position is in an RStudio shop. I have discovered that RStudio is a migical world that seemlessly integrates script files, notebooks, exploring variables, maintaining history, accessing files, loading data, and an interactive command line (both R and bash). Fantastically, the RStudio IDE has both a cloud and desktop version.

Taking in to accout what I have seen in academia, and extending this to my perceptions of industry as well, RStudio reigns predominent in terms of daily usage among these disciplines. Rstudio is supported by a large, profitable organization which does a fantastic job with this product. The RStudio company does release open-sourced versions, but stripped of some important functionality. While technically open source, these versions are not community maintained. Consequently, or by design, the roadmap and development move forward at the sole command of the RStudio company.

I propose developing a free, open-source, data science IDE that combines the best features of the existing commercial and open source options out there.

Chart below.

The Good News

We would not need to build a one-off project from the ground such as with Spyder, Jupyter Notebook, Architect, or Rodeo (which was eventually abandoned; see discussion). Quite the opposite. We would build on an existing framework and get professional support for our own development issues.

The Eclipse foundation hosts and develops an IDE Framework called Theia for building platform agnostic (cloud/desktop) IDEs (More below). This is modern, flexible, extensible, and uses the latest build technologies. Importantly, Theia was designed by intention from the ground up to work on the desktop and in the cloud without needing to create a parallel code base.

Theia IS being very actively developed.

Thiea development activity Development activity on github.

Not only can we get support directly from Eclipse, the very act of building our IDE would likely contribute new ideas and code to the Theia project, creating the sort of positive feedback loop that is one of the shining hallmarks of open-source development.

The Bad News

While I am idealistic, passionate, and a very good scientific programmer, I am not a developer. I'm also old-ish, have a lot of family obligations, and am trying to carve out a career for myself in a new field (namely biomedical data science). I neither have the skill nor the bandwidth to LEAD this project. However, I swear upon all that is good and true that if some person or group would come forward to lead the software development, I would take on a strong supporting role by responding to issues, coding bits and pieces (menu here, UI tweak there), writing documentation, testing, fixing small problems, looking for sponsors, etc.

More about Theia

Wikipedia tells us that

Theia was developed by TypeFox and Ericsson, with additional contributions from Red Hat, IBM, Google and Arm Holdings. It was first launched in March 2017. Since May 2018, Theia has been a project of the Eclipse Foundation.

If you search the 'net, many people refer to Theia as an IDE. However Theia developers fom Eclipse try to emphasize that Theia is a framework to build your own IDE just as they did with their own Che editor and GitPod (which by the way is awesoms). ([differences between Che and Theia]((which by the way is awesoms))).

Another common misconception is that Theia is a VS Code clone. This stems from the fact that in addition to having some of VS Code's look and feel, Theia can actually use VS Code plugins. However, Theia is a completely independent code base.

Other real life examples are Microclimate, potential GitLab integration, the new Arduino Pro IDE, Hyperexponential's infrasturcture.

Why not use Plugins?

Why not just use existing technology (Atom, VS Code, Theia, Jupyter) and build it out with plugins?

The plugin model seems like fun—everyone gets to contribute and users get lots of options. But for serious tools this model fails. Essential functionality (code linting, markdown preview, variable explorer, kernel integration, ...) becomes dependent on individuals in the community implementing versions of these features AND dedicating themselves to support them for eternity. Insted, plugins tend to stall out or become totally abandoned as quickly as they are created.

On the other side of things, the pluginverse become flooded with options making it hard to know which to use. There are currently a multitude markdown previwers for VS Code. Some have more features than others. Some work better than others. Hhow do you pick which to use? You have to try them all first and/or read many reviews; and then hope that development continues and bugs are fixed. If things don't work out, you need to find another plugin.

Consider the data science plugins for Atom and VS Code. There are multiple ones for R with non-overlapping feature sets, and some of the most robust have already been abandoned .

(In writing this in VS Code, I tried one markdown previewer that improperly added line breaks at each newline character in the source and rendered text inside escaped square brackets \[...\] as math. I switched to another that is ok with that, but this one uses a markdown flavor that requires me to use an explicit \— rather than ---.)

That is why I believe the project needs to be curated at the top level by a group of people who will take input from the community and make wise decisions. This has largely been how the Jupyter project has gone. However there are a growing number of unofficial extensions. It will remain to be seen how well this works out.

Theia does have an a plugin interface, can use existing VS code plugins, and is designed to be extended with more deeply rooted extensions. Thus, the community is welcome to add new functionality. Plugins that have shown themselves to be popular, useful, and stable could be curated (i.e. incorporated) in to the main code base and be maintained by others even if the original plugin author moves on to other things.

Feature Comparison Chart (proposed)

I have filled this table in to the best of my knowledge. I do not currenlty use Spyder or Hydrogen, and have not fully explored data science options for VS Code. PLEASE help make this chart more complete and accurate with your suggestions and input!

Jupyter RStudio Spyder VS Code Hydrogen Proposal
IDE-like environment No Yes Yes Yes Yes
Real-time notebook rendering Yes No No Yes Yes
Visual notebook editing Yes No ?? Possibly with plugin Yes Yes
Plain text notebook editing No Yes ?? Possibly with Plugin Yes Yes
Multiple notebook formats No No No Possibly with plugin No Yes
Notebook-Focused Yes No No No Yes Yes
Development-Focused No Yes Yes Yes No Yes
Data science focused Yes Yes Somewhat Poor plugin options Yes Yes
Edit code and notebooks side by side Somewhat Yes Yes Possibly with plugins To the extent that Hydrogen runs in an IDE... Yes
Notebook linked to command line Awkward choice of console format. Yes ?? ?? N/A Yes
Shared environment for notebooks, scripts, and command line Partial--can create individule console for each notebook. Yes Yes ?? N/A Yes
Each notebook needs/gets it's own console. Yes No ?? ?? N/A Yes, if wanted.
Multiple parallel execution environments (kernels) Yes No ?? ?? No Yes
Multi-language support in notebooks Yes Yes ?? ?? Yes Yes
Multi-language support in IDE Yes No No ?? N/A Yes
Variable Explorer Primitive Yes Yes No Yes Yes
Robust file browser No Yes Yes Yes ?? Yes
Easily import data in to computational environment. No Yes ?? ?? ?? Yes
Delegates important functionality to community supported plugins/extensions Yes No No Yes ?? No
Curates and includes solid implementations of important features. No Yes Yes Not for data science. Yes Yes
Maintains command history for console. No Yes Yes ?? N/A Yes
Designed for browser and Desktop No Yes No No No Yes
Works in browser Yes Yes No There is a separate browser version. No Yes
Works on desktop No Yes Yes Yes Yes Yes
Integrated Git support With extension Yes Yes Yes No Yes
Integrated Conda Support Yes with extension No No No No Yes
Robust enterprise support No Yes No Yes No No
Robust community support Yes Yes Somewhat Yes Somewhat Hopefully.
Completely free and open source Yes No Yes No Yes Yes
CAM-Gerlach commented 5 years ago

@abalter

Overall, your goal is laudable, but as an actual former developer of a major data science IDE (Spyder) I'm not sure how practical it is to develop this project compared to dedicating your and the community's time and effort to any one of the existing tools instead. JupyterLab is also under active development (more active than Theia; note the 10x difference in scale on the plot between Theia and Jupyterlab) and has a goal very similar to what you want: building a hybrid web and desktop based, notebook centric, data-science focused IDE with a full suite of tools and plugins. It also has the backing of some of the biggest names and largest communities in the data science field, and is used by millions of people around the world. You're going to have to make an extremely compelling case, more so than even the above (which is quite the effort) for why all that effort should be thrown out and duplicated in a different framework rather than building on everything the community has poured a massive amount of time and effort into.

Unlike with Spyder, where by design we made an proper desktop program written in the same language as users code in it and it is designed to work with, and is not easily adaptable into a web app, there is nothing fundamental about why the features you are asking for couldn't be added, either in JupyterLab core, as one or more plugins, or at the very worst a fork, and turn those "No"s into "Yes"es. There are have been literally millions of dollars in corporate donations, sponsorships and dev time, hundreds of thousands of lines of code, tens of thousands of commits, tens to hundreds of person-years, and numerous paid, full-time developers on the Jupyter team that have been working on the project for years to develop Jupyterlab to this point, plus contributions from hundreds of community members. Furthermore, this was all off the back of a large amount of work already done and already having a strong position in mindshare and credibility and with an over decade-long history spanning back to Jupyter Notebook, IPython notebook and IPython itself. Ergo, it defies believe that a especially now that these tools already exist, that trying to create a new community from scratch would attract enough user, developer and fiscal interest to start over with something new, at least without a very compelling marketing pitch.

Impossible? No. Highly improbable? Yes. I wish you the best of luck, but I again urge you to more thoroughly consider putting your efforts toward making the current alternatives "good enough" rather than striving for perfection and falling short of making a substantial impact at all.

It brings to mind a relevant XKCD:

image

Spyder development history, for comparison (note scale):

image

Similarly, JupyterLab development history:

image

To fill in the chart for Spyder (as I've mentioned I used to be a Spyder developer, but I'll try to be as unbiased as possible):

The notebook stuff requires the first-party "Spyder-Notebook" plugin that is developed by the Spyder core team. While it is a plugin, it integrates with Spyder as fully as any other aspect of the UI since basically every other UI pane is also a implemented using essentially the same core plugin system. The Spyder team is still updated for the forthcoming Spyder 4 release but hopefully that should be in the next couple months

Real-time notebook rendering: Yes, w/1st party plugin Visual notebook editing: Yes, w/1st party plugin Plain text notebook editing: Yes, w/1st party plugin Multiple notebook formats: Possible with plugin Data science focused: 100% (Its literally what Spyder is built for, every bit as much as Rstudio, which it was in fact originally inspired by) Edit code and notebooks side by side: Yes, w/1st party plugin Notebook linked to command line: Yes, w/1st party plugin Each notebook needs/gets its own console: Optional Multiple kernels: Yes Multi-language support in notebooks: No Multi-language support in IDE: Partial Easily import data: Yes [Spyder has a built-in import wizard that can import a variety of file types to lists, numpy arrays and pandas dataframes as well as save and restore individual variables or full sessions] Integrated Conda Support: Partial, further in development

Let me know if you'd like further clarification on any of those aspects.

OldGuyInTheClub commented 5 years ago

I don't think Spyder will ever be ready. v4 has been around the corner for over a year. The other projects such as Jupyter go long on PR but short on delivery of an environment with the kind of debugging, inspection, and visualization tools necessary for exploratory technical computing/research. Tools for developers (PyCharm, SublimeText, etc.) are different from researchers (this is what Matlab gets bang on) and there is just too much churn in FOSS community to think about usability and need before releasing the next great whatsit.

Perhaps if enough people start running Python within RStudio, there will be enough of a clamor for them to do tighter integration with Python kernels that would allow the variable inspector, dockable plots, and debugging that work well for the native R.

CAM-Gerlach commented 5 years ago

I'm no longer a core dev with Spyder, but my main focus was UX, UI text, documentation, support, and addressing common user annoyances. While I agree Spyder 4 has taken much longer than we hoped in the beginning, and in hindsight we would have likely went with a more incremental, modular plan, Spyder did release 7 betas over that time (each with multiple significant new features) and its now on release candidate 1, with release candidate 2 coming out within a week or so and 4.0 final due to follow. If you look at the development history above, you can see the large and increasing amount of work that has gone into it, and most of the time was spent not adding shiny new stuff, but on improving stability and fixing bugs and UX issues, along with the features most requested by the user community (e.g. the new debugger and LSP. Spyder 4's 2 biggest features that took most of the effort, were the two most commonly asked for capabilities).

Given its an open-source community project, there was no PR budget, and every dollar of donations was spent on actually paying developers to implement the features requested by the community (both donations and expenses are fully documented on the OpenCollective). Keep in mind that unlike JupyterLab, which has big corporate sponsors to the tune of millions of dollars, Spyder 4 was funded at a less than a tenth that level and made up most of the slack with volunteers like me (I never got paid a cent, nor did I want to be). We aren't charging users hundreds or thousands of dollars for a revokable license to use Spyder, which has opened up data science to hundreds of thousands of people all around the world who could never afford Matlab or similar proprietary tools, and they've given back by contributing their own time and effort to making Spyder even batter.

At the end of the day, the Python ecosystem is moving fast to keep up the state of the art of data science in general right now. Lack of long-term stability is the price to pay for keeping up with a fast-paced field; once things settle now, the ecosystem will stabilize to match. However, if rapid change and iteration isn't your cup of tea right now, you are free to settle on a particular version and use that for as long as you wish, with the understanding that the wider world may have moved on in the meantime, same if you stuck with any given version of a proprietary package, or even with R these days as even it is getting outstripped by Python in the long run (I being previously a diehard R/Rstudio user myself). But unlike with proprietary software, no one can take that freedom away from you.

OldGuyInTheClub commented 5 years ago

Yes, I remember having several good discussions with you on Gitter. I kludged a mix of Spyder and Jupyter via the Notebook plugin with your assistance although you said that working within Spyder/Markdown (unsupported) would be better in the long term as the external interfaces might not always be available.

That being said, taking this into the realm of freedoms is overstating the case by quite a bit. No one, least of all me, says Spyder was squandering funds or was even remotely close to being funded to achieve its goals. Clearly, Spyder is not a donation magnet but one has to look at what it will take to meet the objective and ask how the small team can achieve all of that when it is getting pennies on the dollar of other projects who have the same ambitions and are themselves still not delivering. I continue to be surprised by and often appalled by how much money Jupyter has received for the future-ware it keeps putting out. At minimum there is no reason for Spyder to claim on its website that third party plugins provide Reports and Notebook capabilities when the former doesn't exist and the latter may not be stable from one Spyder rev to the next.

I don't know what "DataScience" actually is nor do I care. It is a moneymaker for the Universities scrambling to offer degrees in it so bully for them. There are large sets of technical problems that require exploring data and algorithms and taking answers from there to code that does something especially when interfaced to the outside world and double-especially when it will have to be mission critical. Yes, this also means talking with (ugh) hardware. Python and its "ecosystem" promised the world on that and underdelivered on multiple fronts. There are advantages to proprietary tools. If one's alternative can take them on and honestly win, that's fine. Reverting to "we're volunteers doing good works" in the face of evidence is a cop out.

Matlab is expensive for a reason. It hires people, pays them, tests before release, and provides professional support. I was asked to do some light programming at work involving playing with datasets and implementing some algorithms. Python/(Jupyter/Spyder) would have taken 2x-3x as long as it took me in Matlab where a) the tools work and b) I could get fast, knowledgeable support when I needed it. Additionally, LiveScripts are a serious challenge to Notebooks, Tables challenge Pandas, and that is on top of things that the free "ecosystem" will never do like align with certifications/industry standards. e.g. https://www.mathworks.com/solutions/aerospace-defense/standards/do-178.html

If you were going to fly on a plane, do you want one designed with traceable and proven tools or one by some guy on Github?

CAM-Gerlach commented 5 years ago

That being said, taking this into the realm of freedoms is overstating the case by quite a bit.

I got a bit too ideological about that. Ultimately Spyder and Matlab, while similar in some ways, serve different groups of people, and each has its use cases as you illustrate. Many people regard the distinction between libre and proprietary software with various degrees of importance, while many others take a more pragmatic approach and just see them as tools and means to an end. There's no "right" answer or approach in this regard, its a personal choice.

Clearly, Spyder is not a donation magnet but one has to look at what it will take to meet the objective and ask how the small team can achieve all of that when it is getting pennies on the dollar of other projects who have the same ambitions and are themselves still not delivering.

What we proposed for Spyder 4 was adding a relatively modest number of significant, oft-requested features within an existing IDE that already had the great majority of the expected functionality, vs. JupyterLab was essentially creating an entire IDE from scratch, all inside a web browser and in a language few data scientists were familiar with. The size of their core team is about the same as ours; e.g. over the past year, Jupyterlab had 6 people over 100 commits while Spyder had 8.

As explained previously, the great majority of the difficulties with Spyder 4 were not in implementing the features themselves, which was not conceptually that difficult, but working through and resolving a large number of bugs and deficiencies particularly with LSP, that we didn't know a priori. Obviously, one always plans for some difficulty, but in this case it was much more than expected. I for one generally pushed for later announced target dates than were initially announced, though at least in the position I was in even I didn't think it would take this long.

At minimum there is no reason for Spyder to claim on its website that third party plugins provide Reports and Notebook capabilities when the former doesn't exist and the latter may not be stable from one Spyder rev to the next.

These are first-party, not third-party plugins (the site is mistaken on that point) and have existed since 2016, worked fine and were actively supported and developed around the time we originally put up the website. However, right around that time, Anaconda cut the funding they had given us to help develop them, so we eventually had to pause full support for them (except for Spyder Unittest and partially Spyder-Terminal) while focusing all our resources on Spyder 4. Given as mentioned we have no real PR budget, evidently no one ended up adding something to the site itself stating that, although we did have a disclaimer in each of their readmes if users were to actually click the links on the site. Spyder-Reports' main problem is its incompatibility with the latest versions of pweave, not Spyder itself, while last I checked Spyder-Notebook does appear to still work with some bugs and has been minimally maintained, and Spyder 4 support has been merged.

I don't know what "DataScience" actually is nor do I care.

No one really knows for sure, but everyone sure loves to use it. Its an umbrella term, rather nebulous and certainly overused, but it describes what tools like Jupyter, Spyder, Rstudio, and Rodeo, the very object of this discussion, are designed to do (the headline of Rstudio's website is "Open source and enterprise-ready professional software for data science," and Rodeo's is "A Native Python IDE for Data Science". You are welcome to use them for other things, but it shouldn't come as a huge surprise that they may not be the most suitable for something completely different, like aerospace engineering or safety-critical applications that you mention (the vast majority of which don't, won't and probably shouldn't ever use Python over e.g. C, Ada, or other more appropriate choices).

Python and its "ecosystem" promised the world on that and underdelivered on multiple fronts.

Could you point me toward where "Python" or the core PyData stack make the promise of being "double-especially...mission critical"? Where has it overpromised and underdelivered on "talking with (ugh) hardware"? I haven't heard many claims at all about the latter, and while I certainly wouldn't use it in the embedded space, I've had good success using it to build a system of a dozen networked lightning sensors that can reliably send commands and log data from multiple devices (charge controller, sensor hardware, control computer, etc) via various low-level protocols (serial, modbus, ethernet, GPIO) and send it back to a central server from anywhere in the world, along with automated alerting, remote access, command and control, and displaying all of this in a dynamic, interactive web dashboard. Not to say I couldn't have done it in C or another language, but I found it a good fit for the application.

There are advantages to proprietary tools.

Sure there are. Different jobs, different tools. And to be honest, the single biggest weakness throughout much of the Python ecosystem, Spyder especially so, is documentation. Developers love to write code and are generally pretty good at it, but documentation? Not so much...that was another of my focuses with Spyder, but while I did do a full port and rewrite of the existing docs but I didn't end up having the time to do much in the way of very sorely needed expansion. They can also make some pretty poor UI/UX decision sometimes, which was something I found myself arguing with the others about all too frequently.

Reverting to "we're volunteers doing good works" in the face of evidence is a cop out.

A cop out for what, sorry? I'm not sure I understand the point here, since I brought up this in the context of not expending funds to run a big PR operation like you were saying Jupyter did, and in explaining how Spyder serves a different niche than Matlab, the vast majority of our users that simply couldn't afford Matlab licenses for each of the varied devices they plan to run their code on. I should have made it more clear that I didn't mean to imply you should use XXX open source tool just because its open source if its clearly unsuited to your application; I assumed we were discussing something resembling data science here (as is the designed application of all the tools that had theretofore been discussed: Jupyter, Spyder, Rodeo, Rstudio, etc) which has a robust open-source ecosystem and open source itself is a distinct domain-specific advantage.

Matlab is expensive for a reason.

Sure, I didn't say they were purposely bilking people just because they could. Just that both their expense and the non-free nature of their ecosystem results in a significant niche to be filled by open-source tools even if Matlab were strictly superior in every way. Regardless, I certainly am not meaning to imply that Matlab doesn't still have a substantial niche, particularly for applications like you describe (engineering, aerospace, etc), for which it is surely well worth the money.

I could get fast, knowledgeable support when I needed it.

While e.g. Quansight and Anaconda offer something in the way in paid support for Spyder and other open source tools, it isn't exactly the same sort of thing that Matlab offers. However, if a clear market exists, it seems likely that companies will step up to fill this need at some point since with regard to open source in general, providing paid consulting and support for open source tools is the entire business model of numerous companies and worth tens if not hundreds of billions. For example, Red Hat (recently purchased by IBM for $34 billion)'s whole company was build around providing paid support, validation/certification, consulting etc for a free and open source product (RHEL) that anyone can freely distribute, use and modify (as CentOS). Anaconda, Quansight etc. do the same for data science. Of course, you pay a good deal of money reducing the cost advantage of open source, though its important to remember that free as in beer (gratis) isn't what the "free" in "Free and Open Source" is about...but I digress, I don't want to turn this into some ideological debate.

that is on top of things that the free "ecosystem" will never do like align with certifications/industry standards.

Not sure why the scare quotes around "ecosystem", but open source is typically at the forefront of implementing open standards over proprietary solutions, when those standards are widely applicable enough to matter to a meaningful fraction of the userbase. However, general-purpose open source cannot be expected to always implement rigorous, highly specialized regulatory requirements and certification for one specific country, which is why niche proprietary products and companies providing paid validation and certification of open-source tools will always exist. I'm not sure where I recommended using a tool (Python) for an application where it is clearly not very appropriate or accepted (aerospace design and modelling), at least for mission-critical code, an area where conversly Matlab is very well suited.

To note, there is no technical reason why it couldn't, e.g. the R language provides considerable documentation to support FDA regulatory conformance, as does Rstudio and other open source R tools, and third party companies provide fully validated builds of R with an extensive set of documentation, test suites and regulatory documents for various industries, as a result of R being increasingly widely used in the medical field and other such industries, including by numerous major Fortune 500 companies.

As far as I'm aware, there isn't something exactly similar, since it is not as heavily used in these specific areas, but companies like the aforementioned Red Hat do offer testing and validation services for the Python builds and packages included in their distributions (and others, for a fee), and companies like Anaconda and Quansight offer a level of guaranteed support, validation and targeted development aligning with corporate priorities for clients using the PyData stack, including Spyder, that could be used to implement such.

If you were going to fly on a plane, do you want one designed with traceable and proven tools or one by some guy on Github?

I have no idea why you're bringing this up. What does some unknown CAD/validation software tool designed by "some guy on Github" and flying on a plane have to do with using Spyder, Jupyterlab and Rstudio for data science, each tools developed and supported by dozens of active developers and used by hundreds of thousands? Why do you regard "traceable and proven" as contrary to open source? Software tools, open or closed source, can never fully prevent people or companies from making poor high-level decisions, particularly those the compromise long-term safety for short-term gain (cough, 737 Max MCAS, cough), However, by definition open source provides traceability by allowing anyone to examine the source for themselves, and verifiability by opening the source (and the validation test suite) to independent, community examination, testing and validation by experts around the world and the open sharing of any deficiencies found. Does this automatically mean that open source code will be rock solid simply because its open source? Of course not, but if the software has a significant expert userbase, particularly with such demanding requirements, then open source certainly increases the opportunity for this (as e.g. R and Rstudio's compliance documentation explain and justify in detail).

t-wojciech commented 3 years ago

RStudio released another major version which introduced a few things mentioned by @abalter as missing. I think some of you will be interested. RStudio 1.4 introduces visual markdown editor and strict Python support.

I used Rodeo for a long time as an IDE for Python, then changed to Spyder, but I was not satisfied. Now RStudio is the best choice for me, but it can be biased as I previously used RStudio for R and Rodeo for Python. Anyway, I think it's worth testing.