Closed GantMan closed 3 years ago
I don't see why it's loading the files twice, is it calling the function twice?
@GantMan This is really strange to me as well. The code basically uses tfjs.data module to load the file.
I'll investigate with tf.data.csv separately first.
@GantMan Just found out this is coming from the tensorflow csv function. Tried this:
let data = [];
const csvDataset = tf.data.csv("https://s3.amazonaws.com/ir_public/temp/chess_labels.csv");
const column_names = await csvDataset.columnNames();
const sample = csvDataset.take(10);
await sample.forEachAsync((row) => data.push(Object.values(row)));
console.log(data);
and in the network tabs, I got this:
Possible solution is to load and parse CSV using another library like papaparse.
I'll go file a ticket with TFJS on this and see if I can fix it.
Hey bud, this issue is definitely with Danfo and not TFJS.
See proof of concept here: https://codepen.io/gantman/pen/abBRObO
@GantMan Adding this line to the tensorflow csv function causes it to load twice:
await sample.forEachAsync((row) => data.push(Object.values(row)));
This code:
const csvDataset = tf.data.csv(
"https://raw.githubusercontent.com/plotly/datasets/master/finance-charts-apple.csv"
);
const column_names = await csvDataset.columnNames();
const sample = csvDataset.take(10);
await sample.forEachAsync((row) => data.push(Object.values(row)));
console.log(data);
dfd
.read_csv(
"https://raw.githubusercontent.com/plotly/datasets/master/finance-charts-apple.csv"
)
currently gives me:
And we are currently calling this function in danfo's read_csv function. I'm still investigating how to solve this
This is a very weird bug. It happens only in the tfjs browser version. And I also notice it loads the second time from the cache.
Stale issue message
Right now the
read_csv
loads twice.Code example (using yours looking at latest)
https://codepen.io/risingodegua/pen/bGwPGMG
The issue
This has been happening for a few versions. When the CSV is hudreds of MB this basically doubles the load time.