javascriptdata / danfojs

Danfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data.
https://danfo.jsdata.org/
MIT License
4.79k stars 209 forks source link

[read_csv] Support relative file paths #24

Closed gbroques closed 4 years ago

gbroques commented 4 years ago

With pandas, we can pass a relative filepath into read_csv.

For example:

pd.read_csv('data.csv')  

Or:

pd.read_csv("../data_folder/data.csv")

The paths are assumed to be relative to the script where read_csv is called.

Are there plans to support this, and would a pull-request be welcomed?

risenW commented 4 years ago

This a good request. And a pull request would definitely be welcome.

Some thoughts on this:

I think this will only work in danfojs-node, as you can't access local path in the Browser. Also, we currently load Csv using TFjs data api, which specifically needs the full file path. Although this can be abstracted away.

gbroques commented 4 years ago

Acquiring the path of the module that required danfojs doesn't seem straight-forward in Node.js.

See the following StackOverflow which recommends a few solutions like using module.parent.filename (has caveats), or a library called caller-callsite: https://stackoverflow.com/questions/13227489/how-can-one-get-the-file-path-of-the-caller-function-in-node-js


We also need to know if the user inputted a relative filepath as the source for read_csv.

The following function may do the trick:

function isRelativeLocalSource(source) {
    if (utils.__is_browser_env() || source.startsWith('file://')) {
        return false;
    }
    const remoteProtocols = [
        'https://',
        'http://',
        'ftp://',
        's3://',
        'gs://'
    ]
    return remoteProtocols.every(remoteProtocol => !source.startsWith(remoteProtocol))
}

@risenW Do you have thoughts on this?

risenW commented 4 years ago

I have fixed this here, turns out you can get the PWD with node's built-in process module.

gbroques commented 4 years ago

I have fixed this here, turns out you can get the PWD with node's built-in process module.

@risenW Using process.cwd() will lead to bugs and unintended behavior.

For example, let's say you have a setup like this:

/path/to/index.js
/path/to/dir/someScript.js
/path/to/dir/data.csv

Inside /path/to/index.js:

// index.js
const example = require('./dir/someScript');

// other stuff

If you cd /path/to and run node index.js, and index.js requires dir/someScript.js which has the following code:

// someScript.js
pd.read_csv('data.csv')  // will try and load '/path/to/data.csv' which doesn't exist!

Then it'll try and locate data.csv relative to the the current working directory of the node process, which is wherever index.js is (e.g. /path/to).

Not the location relative to someScript.js which is /path/to/dir/data.csv.

Does this make sense?

risenW commented 4 years ago

I have fixed this here, turns out you can get the PWD with node's built-in process module.

@risenW Using process.cwd() will lead to bugs and unintended behavior.

For example, let's say you have a setup like this:

/path/to/index.js
/path/to/dir/someScript.js
/path/to/dir/data.csv

Inside /path/to/index.js:

// index.js
const example = require('./dir/someScript');

// other stuff

If you cd /path/to and run node index.js, and index.js requires dir/someScript.js which has the following code:

// someScript.js
pd.read_csv('data.csv')  // will try and load '/path/to/data.csv' which doesn't exist!

Then it'll try and locate data.csv relative to the the current working directory of the node process, which is wherever index.js is (e.g. /path/to).

Not the location relative to someScript.js which is /path/to/dir/data.csv.

Does this make sense?

It does make sense, didn't think of that. I'll look into it again. Have you had any success so far?

gbroques commented 4 years ago

It does make sense, didn't think of that. I'll look into it again. Have you had any success so far?

I haven't got my hands dirty with a solution yet, but I have done research on the problem.

Did you read my earlier post?

We really want the path to whatever module required danfojs:

// we want the path to wherever this file is
const dfd = require("danfojs-node")

Acquiring the path of the module that required danfojs doesn't seem straight-forward in Node.js.

See the following StackOverflow which recommends a few solutions like:

  1. module.parent.filename (has caveats ⚠️)
  2. or a library called caller-callsite

https://stackoverflow.com/questions/13227489/how-can-one-get-the-file-path-of-the-caller-function-in-node-js

For a robust solution, I might suggest bringing in a third party library like caller-callsite. More information and other library suggestions are found in this StackOverflow answer.

It might be overkill just for this?

I'm not sure what solution you'd be more comfortable with, and how dependency-free or light-weight you want to keep danfojs.

risenW commented 4 years ago

It does make sense, didn't think of that. I'll look into it again. Have you had any success so far?

I haven't got my hands dirty with a solution yet, but I have done research on the problem.

Did you read my earlier post?

We really want the path to whatever module required danfojs:

// we want the path to wherever this file is
const dfd = require("danfojs-node")

Acquiring the path of the module that required danfojs doesn't seem straight-forward in Node.js.

See the following StackOverflow which recommends a few solutions like:

  1. module.parent.filename (has caveats ⚠️)
  2. or a library called caller-callsite

https://stackoverflow.com/questions/13227489/how-can-one-get-the-file-path-of-the-caller-function-in-node-js

For a robust solution, I might suggest bringing in a third party library like caller-callsite. More information and other library suggestions are found in this StackOverflow answer.

It might be overkill just for this?

I'm not sure what solution you'd be more comfortable with, and how dependency-free or light-weight you want to keep danfojs.

Use an external package looks like an overkill. We'll monitor interest in the feature going forward, and might invest time on it if there are more request.

gbroques commented 4 years ago

It does make sense, didn't think of that. I'll look into it again. Have you had any success so far?

I haven't got my hands dirty with a solution yet, but I have done research on the problem. Did you read my earlier post? We really want the path to whatever module required danfojs:

// we want the path to wherever this file is
const dfd = require("danfojs-node")

Acquiring the path of the module that required danfojs doesn't seem straight-forward in Node.js.

See the following StackOverflow which recommends a few solutions like:

  1. module.parent.filename (has caveats ⚠️)
  2. or a library called caller-callsite

https://stackoverflow.com/questions/13227489/how-can-one-get-the-file-path-of-the-caller-function-in-node-js

For a robust solution, I might suggest bringing in a third party library like caller-callsite. More information and other library suggestions are found in this StackOverflow answer. It might be overkill just for this? I'm not sure what solution you'd be more comfortable with, and how dependency-free or light-weight you want to keep danfojs.

Use an external package looks like an overkill. We'll monitor interest in the feature going forward, and might invest time on it if there are more request.

Sounds good to me. We can close for now or leave open.

Up to you.