posit-dev / positron

Positron, a next-generation data science IDE
Other
2.5k stars 76 forks source link

Data Explorer: emptiness in column names #3084

Open EmilHvitfeldt opened 4 months ago

EmilHvitfeldt commented 4 months ago

Positron Version:

Positron Version: 2024.05.0 (Universal) build 1157 Code - OSS Version: 1.89.0 Commit: ed7ad00efad489a0d5de1b4551e70f3cfa78f681 Date: 2024-05-07T08:14:42.800Z Electron: 28.2.8 Chromium: 120.0.6099.291 Node.js: 18.18.2 V8: 12.0.267.19-electron.0 OS: Darwin arm64 23.4.0

Steps to reproduce the issue:

  1. Run following code
example <- data.frame(1:10, 1:10, 1:10)

names(example) <- c("", "age", "age ")

View(example)

Screenshot 2024-05-09 at 11 13 42 AM

What did you expect to happen?

I don't have a good solution, but I still have nightmares from the time I had a dataset with column names, padded with spaces

Were there any error messages in the output or Developer Tools console?

Nope

jthomasmock commented 4 months ago

A) It's kind of good that it "works" right now, but I don't think we are safe in this example. B) I bet this will break our eventual summary statistics C) In pandas, this is also a problem:

image

jthomasmock commented 4 months ago

Although, tbf - this is not really a valid data.frame name

> names(example)
[1] ""     "age"  "age "

> example[""]
Error in `[.data.frame`:
! undefined columns selected
Show Traceback

> tibble::tibble(example)
Error in `env[[name]] <- x`:
! attempt to use zero-length variable name
Show Traceback

But it does work in Pandas

# blank names
import pandas as pd
num_print = pd.DataFrame({'Name':['Ashika', 'Tanu', 'Ashwin', 'Mohit', 'Sourabh'],
        '': [100000000, 100025000, 210000000, 190000000, 0.100000000115151]})

>>> num_print[""]
0    100000000.0
1    100025000.0
2    210000000.0
3    190000000.0
4            0.1
Name: , dtype: float64

Also even works for summary statistics:

# blank names
import pandas as pd
num_print = pd.DataFrame({'Name':['Ashika', 'Tanu', 'Ashwin', 'Mohit', 'Sourabh'],
        'age': [100000000, 100025000, 210000000, 190000000, 0.100000000115151]})

num_print.rename(columns = {'age': '', 'Name': ''}, inplace=True)

image

jennybc commented 4 months ago

Several years ago I spent a lot of time ruminating on names in the R world, most especially in the context of data.frames. I'll link the write-up of where all of that ended up, in case ideas or vocabulary are helpful in working out what we're going to support here:

https://design.tidyverse.org/names.html

jthomasmock commented 4 months ago

@EmilHvitfeldt I split off your other example of leading/trailing whitespace into a separate issue: https://github.com/posit-dev/positron/issues/3089