r-lib / rray

Simple Arrays
https://rray.r-lib.org
GNU General Public License v3.0
130 stars 12 forks source link

Loading files as rrays #232

Closed kanishkamisra closed 5 years ago

kanishkamisra commented 5 years ago

Hi!

I was wondering if having an i/o system like np.load() is in the plan for rray sometime in the near future. It would be cool to have an rray_load() function to load matrices into memory. It is definitely useful for my field (NLP) where we have word embeddings where every row is an n-dimensional vector that encodes some form of semantic information of the word. Often times in NLP systems these are the inputs for models that do various tasks.

Let me know your thoughts on this!

Thanks

DavisVaughan commented 5 years ago

What are you trying to load data from? If it is a rds file you don't have to do anything special. CSV? You might be able to use tseries::read.matrix(), or just use readr or vroom then convert to a matrix. Even though xtensor has methods to load from CSV (where you have to declare ahead of time the type of the data, double/integer/etc) it feels a bit outside of rray's wheelhouse. I'd like to keep it as much about array manipulation as possible.

kanishkamisra commented 5 years ago

In certain cases its a .bin file and in certain other ones it's either numpy specific or a custom .vec file (will have to check how its defined or any documentation on it), and many time its a .txt file with the first column as the rownames, column names dont matter since these vectors have arbitrary dimensions. But I do understand you not wanting rrays to venture into I/O as well. I will try and get back to you with something useful if I find it!

Thanks!