ticoneva / pystata-kernel

Jupyter kernel for Stata based on pystata
GNU General Public License v3.0
13 stars 6 forks source link

Feature request: align %browse magic syntax with Stata #12

Closed ideabucket closed 2 years ago

ideabucket commented 2 years ago

At the moment the %browse magic syntax to show the first 7 records of a dataset is:

*%browse 7 [varlist…] [if…]

The equivalent command in the Stata app is:

browse [varlist…] [if…] in 1/7

It would be great if the browse magic followed stata syntax for two reasons:

  1. Do-files from one source could be trivially exchanged with the other;
  2. Explicit specification of the start record would allow records beyond the first 200 to be viewed.

I'm happy to have a go at writing a PR for this if you're not already working on it.

ticoneva commented 2 years ago

I agree that aligning syntax with Stata is a good idea, but for magics I also want to maintain compatibility with stata-kernel. One possibility is to allow both N and in, but outputs an error when the user tries to include both.

While we are at this topic, we could also consider using browse instead of %browse or *%browse. The reason is, while browse is a valid command in Stata GUI—it opens the data editor in browse mode—it is not valid in console mode, and by extension, in pystata. There is thus no risk of overwriting any default behavior, because there is none to begin with.

ideabucket commented 2 years ago

One option might be to select syntax based on how the magic is invoked:

I love the idea of taking over browse, but it feels like it would be a lot of work: what happens if there's a browse command somewhere in the middle of a multi-line cell? If the kernel only picks up browse as the first/only line of a cell, you have a situation where the exact same command produces different behaviour depending on where it is in the code; but to get consistent behaviour you'd have to essentially write a Stata language parser so you could pick out the commands that need to be taken over by the kernel, which is like the issue from #4 but worse.

ticoneva commented 2 years ago

Release 0.2.3 adds support for [in]. Syntax is the same as in Stata. Please check if it is working as intended.

ideabucket commented 2 years ago

I've submitted a PR with a small behind-the-scenes update but from a functional point of view the stata-esque syntax seems to work properly.

One UI issue, though: the row numbering in the browse table is always 0…N. This is a problem in two ways:

Ideally the row numbers should reflect the observation's index in Stata. Failing that, it is probably better not to display them at all.

ticoneva commented 2 years ago

We could generate a new Stata variable that equals to _n everytime *%browse is called and display that instead of pandas' built-in index. What do you think?

ideabucket commented 2 years ago

Row numbers implemented as of #14.

gaksaray commented 7 months ago

I appreciate the *% browse [in] implementation. However, as of version 0.3.1, I come across some issues.

*% browse [in] by itself doesn't finalize running and gets stuck on [*]:

1

One would need to add an N argument (any number) to it for it to work:

2

... or a valid [if] expression:

if

Another issue is that when *% browse produces an error, an extraneous variable is permanently created:

3

In this example, [2] produces an out-of-range error, and [3] shows that a variable named __000000 is added to the dataset, which is not the expected behavior.

ticoneva commented 7 months ago

Fixed the issue of *%browse getting stuck when using in alone in version 0.3.2.

Leaving behind temporary variables after a crash is the default behavior of Stata, and I do not think it is imperative to delete them through the kernel.

gaksaray commented 7 months ago

I have found another issue. When negative numbers are used in the in range identifier, *%browse behaves inconsistently with its native Stata counterpart.

In Stata, browse in -1 shows the last observation:

Screenshot 2024-02-29 at 09 12 56

Whereas in pystata-kernel, *%browse in -1 shows the whole dataset:

Screenshot 2024-02-29 at 09 10 59

In Stata, browse in -2/l shows the last two observations:

Screenshot 2024-02-29 at 09 13 10

Whereas in pystata-kernel, *%browse in -2/l shows the last three observations appended to the whole dataset:

Screenshot 2024-02-29 at 09 12 03

In Stata, browse in -1/l shows the last observation:

Screenshot 2024-02-29 at 09 13 20

Whereas in pystata-kernel, *%browse in -1/l shows the last two observations appended to the whole dataset:

Screenshot 2024-02-29 at 09 12 23
hugetim commented 7 months ago

These examples work in nbstata, somewhat to my surprise. The code is here: https://github.com/hugetim/nbstata/blob/master/nbs/07_browse.ipynb