javascriptdata / danfojs

Danfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data.
https://danfo.jsdata.org/
MIT License
4.81k stars 209 forks source link

Not able to get valueCounts() from JSON file #526

Open Svein-Tore opened 2 years ago

Svein-Tore commented 2 years ago

Hi!

I have a JSON file, and I want to get the counts of a column.

The JSON file has this columns:

{

"started_at": "2022-05-01 00:55:46.521000+00:00", string "ended_at": "2022-05-01 01:02:05.964000+00:00", string "duration": 379, OK int 32 "start_station_id": "450", int 32 OK "start_station_name": "Elisenberg", "start_station_description": "ved holdeplassen", string OK "start_station_latitude": 59.919524, float32 "start_station_longitude": 10.70884, float32 "end_station_id": "429", int32 OK "end_station_name": "Thune", string "end_station_description": "ved bomringen", string OK "end_station_latitude": 59.92208, float32
"end_station_longitude": 10.68588 float32

}

NB the string and other types are added and is not in the file. For the columns marked with OK the valueCounts work. I want to get valueCounts for the column start_station_name, but I get the error:

image

To Reproduce The HTML file is as follows.

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <script src="https://cdn.jsdelivr.net/npm/danfojs@1.1.2/lib/bundle.min.js"></script>    <title>Document</title>
</head>
<body>
<script>
    // Replace ./data.json with your JSON feed
    fetch('./jsondata.json').then(response => {
        return response.json();
    }).then(data => {
        // Work with JSON data here
        var df = new dfd.DataFrame(data);
        var s=df['start_station_id'].valueCounts()
        var mestpop=s.head(3)
    }).catch(err => {
        // Do something for an error here
    });
</script>
</body>
</html>

Expected behavior I want to get valueCounts for the column start_station_name. A colleague tested this with Pandas, and then it worked OK: image It is the line

s=df.start_station_name.value_counts()

That stops the program.
I know the syntax in danfo is differnt, I have tried with

s=df['start_station_name'].valueCounts()

Maybe that is not correct?

Desktop (please complete the following information):

Additional context Add any other context about the problem here.

ryan-williams commented 1 year ago

This looks like the same issue as https://github.com/javascriptdata/danfojs/issues/484#issuecomment-1368282864.

Strings that begin with the same integer (but have a different suffix) break Series.valueCounts:

new Series(["2021-01", "2021-02"]).valueCounts()
// Error: IndexError: Row index must contain unique values

new Series(["1", "1_"]).valueCounts()
// Error: IndexError: Row index must contain unique values

484's title seems to misdiagnose the issue a bit, and it's closed anyway, so maybe this is a better place to track it.