Closed ideabucket closed 2 years ago
I agree that aligning syntax with Stata is a good idea, but for magics I also want to maintain compatibility with stata-kernel
. One possibility is to allow both N
and in
, but outputs an error when the user tries to include both.
While we are at this topic, we could also consider using browse
instead of %browse
or *%browse
. The reason is, while browse
is a valid command in Stata GUI—it opens the data editor in browse mode—it is not valid in console mode, and by extension, in pystata
. There is thus no risk of overwriting any default behavior, because there is none to begin with.
One option might be to select syntax based on how the magic is invoked:
%browse
, follow stata-kernel
syntax*%browse
, follow Stata-aligned syntax (since stata-kernel
wouldn't recognise that anyway and so backwards-compatibility is not required)I love the idea of taking over browse
, but it feels like it would be a lot of work: what happens if there's a browse
command somewhere in the middle of a multi-line cell? If the kernel only picks up browse
as the first/only line of a cell, you have a situation where the exact same command produces different behaviour depending on where it is in the code; but to get consistent behaviour you'd have to essentially write a Stata language parser so you could pick out the commands that need to be taken over by the kernel, which is like the issue from #4 but worse.
Release 0.2.3
adds support for [in]
. Syntax is the same as in Stata. Please check if it is working as intended.
I've submitted a PR with a small behind-the-scenes update but from a functional point of view the stata-esque syntax seems to work properly.
One UI issue, though: the row numbering in the browse table is always 0…N. This is a problem in two ways:
browse in 1/10
you will get a table numbered from 0–9.browse in 15/20
the row numbers will be 0–5. Ideally the row numbers should reflect the observation's index in Stata. Failing that, it is probably better not to display them at all.
We could generate a new Stata variable that equals to _n
everytime *%browse
is called and display that instead of pandas
' built-in index. What do you think?
Row numbers implemented as of #14.
I appreciate the *% browse [in]
implementation. However, as of version 0.3.1, I come across some issues.
*% browse [in]
by itself doesn't finalize running and gets stuck on [*]
:
One would need to add an N
argument (any number) to it for it to work:
... or a valid [if]
expression:
Another issue is that when *% browse
produces an error, an extraneous variable is permanently created:
In this example, [2]
produces an out-of-range error, and [3]
shows that a variable named __000000
is added to the dataset, which is not the expected behavior.
Fixed the issue of *%browse
getting stuck when using in
alone in version 0.3.2
.
Leaving behind temporary variables after a crash is the default behavior of Stata, and I do not think it is imperative to delete them through the kernel.
I have found another issue. When negative numbers are used in the in
range identifier, *%browse
behaves inconsistently with its native Stata counterpart.
In Stata, browse in -1
shows the last observation:
Whereas in pystata-kernel
, *%browse in -1
shows the whole dataset:
In Stata, browse in -2/l
shows the last two observations:
Whereas in pystata-kernel
, *%browse in -2/l
shows the last three observations appended to the whole dataset:
In Stata, browse in -1/l
shows the last observation:
Whereas in pystata-kernel
, *%browse in -1/l
shows the last two observations appended to the whole dataset:
These examples work in nbstata
, somewhat to my surprise. The code is here: https://github.com/hugetim/nbstata/blob/master/nbs/07_browse.ipynb
At the moment the
%browse
magic syntax to show the first 7 records of a dataset is:The equivalent command in the Stata app is:
It would be great if the browse magic followed stata syntax for two reasons:
I'm happy to have a go at writing a PR for this if you're not already working on it.