posit-dev / positron

Positron, a next-generation data science IDE
Other
2.28k stars 68 forks source link

console needs to more clearly indicate that no input is possible when code is running #970

Closed jjallaire closed 1 year ago

jjallaire commented 1 year ago

To reproduce the code snippets below, start by downloading the data set from this Kaggle competition: https://oreil.ly/B9wfd. Then, make sure you are working within the directory where the dataset was extracted.

Consider the following Python code:

import pandas as pd
import numpy as np
from pathlib import Path
from fastai.tabular.all import add_datepart, cont_cat_split, Categorify, FillMissing, TabularPandas
from fastbook import draw_tree
from sklearn.tree import DecisionTreeRegressor

df = pd.read_csv("TrainAndValid.csv", low_memory=False)
sizes = ['Large', 'Large / Medium', 'Medium', 'Small', 'Mini', 'Compact']
df['ProductSize'] = df['ProductSize'].astype('category')
df['ProductSize'].cat.set_categories(sizes, ordered = True, inplace = True)
dep_var = 'SalePrice'
df[dep_var] = np.log(df[dep_var])
df = add_datepart(df, 'saledate')
df_test = pd.read_csv('Test.csv', low_memory=False)
df_test = add_datepart(df_test, 'saledate')

procs = [Categorify, FillMissing]
cond = (df.saleYear<2011) | (df.saleMonth<10)
train_idx = np.where(cond)[0]
valid_idx = np.where(~cond)[0]
splits = (list(train_idx),list(valid_idx))
cont, cat = cont_cat_split(df, 1, dep_var=dep_var)
to = TabularPandas(df, procs, cat, cont, y_names=dep_var, splits=splits)

xs,y = to.train.xs,to.train.y
valid_xs,valid_y = to.valid.xs,to.valid.y
m = DecisionTreeRegressor(max_leaf_nodes=4)
m.fit(xs, y)
draw_tree(m, xs, size=7, leaves_parallel=True, precision=2)

If you do a Cmd+A to Select All and then Cmd+Enter to run, you'll notice that all of the code is correctly submitted to the Console however the Console prompt is also printed (implying that it is "ready for input").

seeM commented 1 year ago

I personally find this very annoying and would like to consider bumping it to private alpha.

EDIT: To clarify, in data science contexts we are very often executing long-running code, and I've become used to seeing the cursor on the next line without a prompt indicator (>>>) as the main indication that something is running. Moreso than any stop button or other indicator. Although, I only realised this after seeing the current Positron behaviour.

EDIT2: While the green bar indicator on the left of the input is really nice, in data science contexts we also often have lots of info printed out (e.g. when training a model or processing data), which pushes that indicator out of visibility.

jjallaire commented 1 year ago

I agree (will do that now)

softwarenerd commented 1 year ago

This appears to have been fixed by work that was done recently. @jjallaire, could you try this again using a recent build (on / after August 24th)?

jjallaire commented 1 year ago

I've verified that the console doesn't show the prompt but in doing so discovered a new issue. If you follow the instructions above and then edit the code block to remove the import and invocation of the draw_tree() function:

import pandas as pd
import numpy as np
from pathlib import Path
from fastai.tabular.all import add_datepart, cont_cat_split, Categorify, FillMissing, TabularPandas
from sklearn.tree import DecisionTreeRegressor

df = pd.read_csv("bluebook/TrainAndValid.csv", low_memory=False)
sizes = ['Large', 'Large / Medium', 'Medium', 'Small', 'Mini', 'Compact']
df['ProductSize'] = df['ProductSize'].astype('category')
df['ProductSize'].cat.set_categories(sizes, ordered = True, inplace = True)
dep_var = 'SalePrice'
df[dep_var] = np.log(df[dep_var])
df = add_datepart(df, 'saledate')
df_test = pd.read_csv('bluebook/Test.csv', low_memory=False)
df_test = add_datepart(df_test, 'saledate')

procs = [Categorify, FillMissing]
cond = (df.saleYear<2011) | (df.saleMonth<10)
train_idx = np.where(cond)[0]
valid_idx = np.where(~cond)[0]
splits = (list(train_idx),list(valid_idx))
cont, cat = cont_cat_split(df, 1, dep_var=dep_var)
to = TabularPandas(df, procs, cat, cont, y_names=dep_var, splits=splits)

xs,y = to.train.xs,to.train.y
valid_xs,valid_y = to.valid.xs,to.valid.y
m = DecisionTreeRegressor(max_leaf_nodes=4)
m.fit(xs, y)

The call to m.fit() completely blows up the console (it actually disappears entirely, note the JS errors at right):

Screen Shot 2023-08-25 at 9 09 46 AM

@jmcphers This line of code might be trying to show HTML, so I suspect the recent HTML output work we did?

petetronic commented 1 year ago

Confirming I see this issue in the Dev Tools Console still, I've split this out to its own issue https://github.com/posit-dev/positron/issues/1195 and close out this original issue as fixed.