microsoft / CNTK

Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
https://docs.microsoft.com/cognitive-toolkit/
Other
17.52k stars 4.28k forks source link

BrainScript extension for Visual Studio Code #1962

Closed vermorel closed 7 years ago

vermorel commented 7 years ago

Visual Studio Code is getting a huge momentum. It would be extremely nice if BrainScript had its own VS Code extension. BrainScript is amazing, however, it's still lacking support in the IDE that are actually used by the broader Microsoft dev community. An VS Code extension would significantly boost adoption of BrainScript, the software effort being modest.

cha-zhang commented 7 years ago

We need to understand better how many people are still on BrainScript, and why. In general CNTK is moving towards Python, and we are not adding more support to BrainScript. Our primary assumption is BrainScript is yet another script language and people don't want to learn it. This, of course, can be changed if there is huge demand to keep BrainScript up-to-date.

Could those who are using BrainScript explain why you are not switching to Python? Simplicity? Performance? Legacy code? Or something else?

ghost commented 7 years ago

We use Brainscript because we require simplicity and performance over richness of features or the extensive research environment.

We need a clean execution engine with minimum scripting (#ifdef and likes are high on our internal list of BS features) - to pump and train on our data. We do not use or have plans of switching to python for data extraction and processing and thus it has zero value to us in production.

cha-zhang commented 7 years ago

AFAIK, CNTK's Python binding today has less than 1% performance hit in the worst case, and in many cases much faster due to better sparse data handling. It's very simple to use as well.

Can you expand on the clean execution argument?

ghost commented 7 years ago

@cha-zhang we run a business. python is not suited for us - from the software engineering point of view.

1% is relevant to workloads we run, and is why we chose to invest time and investigate CNTK closer to start with. personally i do not understand why CNTK should be concerned in any way data-specific optimization. there is no such thing as "sparse data" - you simply want to use the platform for feature engineering while we need a distributed compute engine. if you start being involved into data specifics you will loose the focus on performance - and us in the process.

PS: on the clean execution engine. our potential use case is brainscript for training and C++/C# API for execution. we need none of the python features (apart -maybe- RDD/hadoop connectivity) and our ETL is in-house. we are concerned with keeping things simple. 300 gigs release is not simple. do you remember the size of the first jdk? it is much easier to port brainscript model to verilog and things, in real life, do not need to be more complicated than that.

cha-zhang commented 7 years ago

Thanks!

If anyone else has strong reasons to use BrainScript, please explain your use case by commenting here or sending me email directly. We will aggregate the feedback and plan the future of CNTK accordingly.

ghost commented 7 years ago

@cha-zhang i would like to make an extra comment regarding the "speed" aspect.

some examples under "python is on the order of between 10 and 100 times slower than C++ when doing any serious number crunching.":

C vs python (rather old) Stackoverflow - C vs Python some other links: https://news.ycombinator.com/item?id=9753366 https://www.quora.com/How-fast-is-Python-compared-to-C-C++ https://benchmarksgame.alioth.debian.org/u64q/compare.php?lang=python3&lang2=gpp

JohnCraigPublic commented 7 years ago

Never used Python, and while I hear its 'easy' -- I have found Branscript to be very easy -- so it was a simpler path for me to take, as opposed to setting up and learning python. 3.4? 3.5? what is Anaconda? etc etc. I also live in a "product world" of C, C++, and some C#. No python in anything we do.

cha-zhang commented 7 years ago

The machine learning community has chosen Python, and that's why all toolkits has Python bindings. In the case of CNTK, we started with C++, and Python is just a very thin wrapper on it. That's why CNTK's python binding can run as efficient as BrainScript.

We will continue optimizing the underneath C++ code, and Python will subsequently benefit from that.

skynode commented 7 years ago

@cha-zhang Correct, the machine learning community has chosen Python but most other toolkits like CNTK have also used C++ at their cores.

Now the extent to which each toolkit has supported the community-beloved Python bindings (I actually love Python) is where the toolkits differ especially in performance. CNTK has a thin Python wrapper and so retains many of the superior performance affordable to using C++, while Tensorflow, for instance, has extended Python support beyond its core C++ codebase and subsequently underperforms CNTK.

Maintaining that superior performance at a higher level of abstraction is something that BrainScript, due to it's cut-to-the-chase, high (CNTK?) performance design philosophy, is more likely than Python to achieve.

cha-zhang commented 7 years ago

There is a cost of maintaining and updating BrainScript. Currently, for the same layer, Python and BrainScript go through two paths inside C++, one via V2 library, and one via V1 library. We need to merge them to simplify, and it's not straightforward. An alternative solution (and simpler way) is to factor the V1 library code out and separate them from V2 library. However, it's unlikely we will update V1 code as frequently.

Your input is important to us, and we will discuss and see if there can be a compromise.

ghost commented 7 years ago

@cha-zhang so you made a major break and reset in your process. given that python is the development target - it is understandable you have to divert resources and setup unit test etc environment from that. the logical way would of been to maintain braiscript for unit tests and API development and waterfall functionality into python. not the inverse. imho that is the longer term "correct" engineering route.

what is very ugly to see, from where we are, is the introduction of unnecessary complexity into the model serialization that breaks the API compatibility. and we think this is a major mistake. instead of using something simple and readable - serializing to XML or binary struct - you chose the stateful objects. that is not a new concept - and it has failed every time somebody tried to build something on top. That issue is key in giving users tools to work with models (brainscript/python) and model weights (currently lacking) independently. Only then you can have clear functionality break with clear boundaries for support - node functionality and associated tensors could be easily transferred between models based on different API levels.

To summarize, what migration path are you offering to your NDL and brainscript folks that invested so much into model training?

PS: side remark on python "ML community". there was cobol community, matlab community, java community, R community. Once you step into the industry things change.

alternatives we look for - which do not lock us into platforms, but are maintained to scale with technology are the likes of OSU-Caffe. And before the 2.0 release we viewed CNTK to being closer to that offering with the advantage of larger support potential (in terms of cross-platform API and the convenience of incorporating model based inference into existing processes).

DHOFM commented 7 years ago

From the point of a developer who used C# since .net 1.0 BrainScript was very clear to me and one of the reasons i used CNTK for a new prototype where building, instead of TensorFlow or others. Now i did all evaluation and further in Python. Python is ok, i still miss the brackets {} but it´s ok. For training, BrainScript looks more structured, you have a better overview than in Python, so i would recommend to keep it alive. For Syntax Highlighting and so on i renamed a copy of the BrainScript Files to .c so VS Code can show it better. For the Future a full support of BrainScript in VS or VS Code with debugging would be nice or a full support of c# but anyway if the decision is Python only i can live with it...

ebarsoumMS commented 7 years ago

First if all, thanks everyone for the feedbacks.

@avader906 regarding

"so you made a major break and reset in your process. given that python is the development target - it is understandable you have to divert resources and setup unit test etc environment from that. the logical way would of been to maintain braiscript for unit tests and API development and waterfall functionality into python. not the inverse. imho that is the longer term "correct" engineering route."

The above is not always possible, the goal of v2 was to provide the flexibility needed to write any custom computation graph in addition to high level abstraction, that wasn't possible in v1 graph. We utilized as much as possible from v1, however, when v1 was designed the need of DNN community was different than now. Also, if you look at CNTK Python Layer API, it matches very closely BrainScript version. The performance should always matches, if not that it is a bug.

For adding BrainScript on the top of V2 graph, that is a possibility that depend on the demand.

Regarding:

"To summarize, what migration path are you offering to your NDL and brainscript folks that invested so much into model training?"

You can load and evaluate your model in C++ or python API, so that isn't going away and BS isn't going to be deprecated.

@DHOFM For "From the point of a developer who used C# since .net 1.0 BrainScript was very clear to me", can you explain why? I would have through that Python is more closer and at least you have the debuggability of Python. If we expose training through .Net, will you prefer that versus BrainScript?

vermorel commented 7 years ago

@ebarsoumMS @cha-zhang @JimSEOW Thank you very much for your follow-up. There has been significant discussion on #960. As the original poster here, I would like to share my perspective, as maybe the logical conclusion might be to not push further for BrainScript.

The case is more extensively discussed at http://blog.vermorel.com/journal/2017/6/6/details-on-the-net-first-strategy-for-cntk.html

In short, I submitted this issue while I was under the impression that BrainScript was a core strategy for CNTK. Apparently, it isn't; and this looks fine to me, as long as a .NET-friendly alternative is provided to offer the same degree of correctness by design.

If the CNTK team clearly states that the goal is to make .NET a first-class citizen, then this present issue should probably be dismissed entirely. No need to divert resources from the .NET initiative.

JimSEOW commented 7 years ago

@vermorel I think it is too premature to just dismiss Brainscript at this stage. I agree one has to be very careful not to dilute the CNTK limited time and energy.

Perhaps some other people could just implement the Brainscript Visual studio extension. Not the CNTK team.

I do agree that Brainscript could serve as an OVERVIEW or CONFIG FILE, especially if we could capture the tensor network information for visualization.

We are still too early to decide at this stage. There is ONLY one common goal. I hope. Q4 2017. We need to show CNTK an important component for .NET Front End AI deep learning strategy. Period! How?, let us explore

vermorel commented 7 years ago

@JimSEOW Thank you very much for your follow-up. I not suggesting to immediately dismiss Brainscript, but as others have hinted, BrainScript has already been deprioritized by the CNTK team.

I am very skeptical that CNTK can make any progress in the .NET community by focusing on the .NET front-end. If the front end is all you need, then, using TensorFlow is trivial in C#. The bulk of the complexity lies on the back end, and that's where the .NET bindings really matter.

JimSEOW commented 7 years ago

@vermorel there are different stakeholders interested to see the success of CNTK. The validity of backend production codes instead of python codes are solid. This will happen as part of the process of dealing with Front end strategy.

CNTK needs every bit of support from all different .NET communities to make it a credible .NET deep learning option. By also including the Front End, we get cross platform Unity and Mobile People to join.

=> I do not know where this discussion will go. I just see people getting frustrated that the .NET Deep Learning path is NOT Clear.

vermorel commented 7 years ago

@JimSEOW Just bouncing on your remark:

I think it is too premature to just dismiss Brainscript at this stage. I agree one has to be very careful not to dilute the CNTK limited time and energy.

We might have a slight misunderstanding here, and it might be important.

From my perspective, as a .NET entrepreneur, I am building business software than my team will realistically need to support for 5+ years minimum (we are already a 10 year's old company). Thus,

I am not specifically attached to Brainscript: what matters is picking the option that represent the future of CNTK, the one option that will realistically offer us support 5 years from now if CNTK is a success.

Various shareholders might have different perspectives, but the .NET ecosystem is very driven by business apps, and those apps are long lived. CNTK won't win the .NET ecosystem without the longer view.

JimSEOW commented 7 years ago

@vermorel Let us see if the Xamarin team has any interest with the BrainScript. I have read your argument. Let us see who else supporting it. We are just at the beginning to gather people together. Perhaps more people are interested of your VISION. It is hard to tell now.

ghost commented 7 years ago

@ebarsoumMS you own it to your users to explain API changes beforehand as well as clear development strategy. telling go use python is unacceptable nor is deference to "ml community" remarks. we've been doing computer vision since before you were born.

vermorel commented 7 years ago

@JimSEOW As @gamemachine pointed-out, from a developer perspective, the notebooks (Xamarin Notebooks or Jupyter) are cool but fully non-critical. I wish Xamarin/Microsoft all the success they can get. It will only make .NET stronger, a big plus for me and my business. Yet, this is a R&D / demoing / student use-case, not a production use case. What CNTK urgently need for .NET is a strong production vision.

cha-zhang commented 7 years ago

We are in transition to a new, more open project management scheme, where we will be publishing our plans on GitHub monthly or every 6 weeks. In that way it would be clearer where the team is heading to, and you could also provide us timely feedback in future directions.

We will also try to do better in our release notes, first try to minimize API changes, and if there is, we will explain them better.

@ghost, we recommend people to switch to Python because nowadays most development is done in C++ and Python. We update BrainScript from time to time when some new nodes are added, but it won't be as up-to-date as Python APIs are. Unless you have a very strong reason to stay with BrainScript, we recommend you to switch to Python. On the other hand, if you have an existing BrainScript config File that's working, expect them to work for at least another 1-2 years. (We still support NDL scripts, if you know what it is.)

@JimSEOW, for Xarmin, one thing is for multi-platforms, the main obstacle for CNTK is not UI, but the native C++ and GPU support. Xarmin does not help us to make our large part of C++ code run on Andorid/iOS and etc. We are still evaluating different options.

I just read the thread https://github.com/Microsoft/CNTK/issues/960. Very inspiring. We will carefully think about it.

n17s commented 7 years ago

Sorry if I missed something (this is a very long thread) but in my opinion the CNTK team should not be in the business of creating and maintaining a language. Brainscript was a big improvement over NDL, but there are many fine languages out there that people already know, including .NET languages. Between C# bindings and upgrading Brainscript to support many of the features we added only in V2, I would choose the former any day.

There's also a misconception about speed. With appropriate support from the toolkit, the time spent outside the CNTK C++ libraries should be minimal. So the "Python is slower than C++" argument is irrelevant as all the heavy lifting is done in C++.

DHOFM commented 7 years ago

@ebarsoumMS Sorry I am late, due to being very busy developing a prototype

I would have through that Python is more closer and at least you have the debuggability of Python. If we expose training through .Net, will you prefer that versus BrainScript?

Yes the debuggability of Python is an argument. But Brainscript with it´s C-like Syntax is straight to the point to me. Python is a little bit "hacky" coming from C# but it´s ok - i wish it would use brackets instead of spacing but with a good IE like PyCharm it´s ok. A .net C# Support of training would be great for me and other developers in the company, who are also familar with C#

JimSEOW commented 7 years ago

@DHOFM Please follow .NET here

cha-zhang commented 7 years ago

I'm closing this issue for now. Thanks for all the suggestions!