Open g3ortega opened 8 years ago
Hi! I'll be glad to help you, but I'd rather not just port Vector from Ruby's stdlib. When starting Crystalla, I was trying to borrow ideas and APIs mainly from Matlab and Numpy, which I think are much more mature for this particular domain.
So a couple of goals for Crystalla would be:
I guess a resulting non-goal is: to support existing Ruby scripts out of the box.
It's a bit analogous to one of Crystal's own goals: "Have a syntax similar to Ruby (but compatibility with it is not a goal)".
Let's do this, if you are still on board with that direction, why don't you share one of the Ruby code snippets you're trying to port here? We can explore together organic ways of extending Crystalla to provide all the necessary features.
WDYT?
Sure, thanks for your response. Here's a simple sample of k-Nearest Neighbors using vectors to represent coordinate minutes, and there are others applications for Vector class, but I totally understood what you said, I will take a look to Matlab and numpy implementations/alternatives for some libs and try to figure out how to work on a similar API in Crystal.
require "matrix"
class HouseValidator
def initialize(data, position)
@data = data
@position = position
end
def find_nearest
@data.sort_by {|point,v|
distance(point, @position)
}.first
end
def find_nearest_with_k k
@data.sort_by {|point, v|
distance(point, @position)
}.first(k)
end
private
def distance v1, v2
(v1 - v2).magnitude
end
end
# Distance in minutes for lng and lat
neighborhoods_data = {
Vector[56, 13] => "Safe",
Vector[23, 9] => "Unsafe",
Vector[45, 28] => "Safe",
Vector[12, 3] => "Unsafe",
Vector[15, 7] => "Unsafe",
Vector[7, 3] => "Unsafe",
Vector[49, 31] => "Safe"
}
house_1 = Vector[10, 10]
house_2 = Vector[40,20]
house_3 = Vector[32,10]
h1 = HouseValidator.new(neighborhoods_data, house_1)
p h1.find_nearest
h2 = HouseValidator.new(neighborhoods_data, house_2)
p h2.find_nearest
h2 = HouseValidator.new(neighborhoods_data, house_3)
p h2.find_nearest
Thanks for sharing! Actually, we don't have a Vector abstraction but we do have some convenience methods to create row and column vectors in Matrix. For example, see:
https://github.com/mverzilli/crystalla/blob/master/spec/matrix_spec.cr#L71
That spec file should provide a lot of hints on what's currently possible with the lib.
Also, it seems you're only using two Vector operations: subtraction and "magnitude". I don't have time to test it right now, but I think my first attempt would be to:
Vector[32,10]
with Matrix.row_vector(32,10)
.If that works, the only pending issue would be to decide how to implement "magnitude". Maybe we could add a "norm" method to Matrix... I think the norm of a vector is just a special case of the norm of a matrix (but I should revisit my Linear Algebra bibliography :P).
Edit: of course, you may also need to introduce some other changes so it compiles in Crystal :)
This seems to exhibit the same behaviour as your script:
require "crystalla"
include Crystalla
class HouseValidator
def initialize(data : Hash(Matrix, String), position : Matrix)
@data = data
@position = position
end
def find_nearest
point, v, distance = @data.map { |point, v| {point, v, distance(point, @position)} }
.sort_by { |(point, v, distance)| distance }.first
{point, v}
end
private def norm(m : Matrix)
raise "norm is only defined for row vectors" unless m.row_vector?
square_norm = 0
m.each_row do |r|
square_norm = r.map { |x| x * x }.sum
end
Math.sqrt(square_norm)
end
private def distance(v1, v2)
norm(v1 - v2)
end
end
# Distance in minutes for lng and lat
neighborhoods_data = {
Matrix.row_vector [56, 13] => "Safe",
Matrix.row_vector [23, 9] => "Unsafe",
Matrix.row_vector [45, 28] => "Safe",
Matrix.row_vector [12, 3] => "Unsafe",
Matrix.row_vector [15, 7] => "Unsafe",
Matrix.row_vector [7, 3] => "Unsafe",
Matrix.row_vector [49, 31] => "Safe",
}
house_1 = Matrix.row_vector [10, 10]
house_2 = Matrix.row_vector [40, 20]
house_3 = Matrix.row_vector [32, 10]
h1 = HouseValidator.new(neighborhoods_data, house_1)
p h1.find_nearest
h2 = HouseValidator.new(neighborhoods_data, house_2)
p h2.find_nearest
h2 = HouseValidator.new(neighborhoods_data, house_3)
p h2.find_nearest
Note the implementation of norm
is far from elegant :P. I guess that's because we lack an Enumerable implementation over rows or cols... We'll need to think about that and seek some inspiration from Numpy :)
I guess sooner than later we'll need to implement something like Numpy's n-dimensional array. Being able to operate with matrices and slice them into submatrices (as you generally would in Matlab) is really nice, but it's kind of limited to 2-dimensional matrices. Typical ML libs and algorithms need to work in n-dimensional structures, so matrices will only get us so far. I guess it's time to roll up our sleeves and take Crystalla in that direction!
Wow, thank you @mverzilli. Now I have a crystal version running and a better idea about how to migrate other scripts from ruby to crystal 🎆
time ruby ~/vector_sample.rb 0.07s user 0.05s system 52% cpu 0.240 total
time ./vector_sample 0.00s user 0.00s system 76% cpu 0.006 total
I'm falling in love with Crystal and the community 😃 and I want contribute in the future to this library and crystal itself. For now I am going to report any problem during some experiments while migrating more advanced algorithms, and take a closer look to Numpy and Matlab for ideas.
What would you recommend me to read, do or take a look, to be able to contribute in the future? Do you have a kind of roadmap for crystalla?
Thanks!
Hey, thanks for the kind words! I started Crystalla because I'm a pathological procrastinator. I had decided to follow this tutorial: http://deeplearning.stanford.edu/tutorial/, but when I started it I said "you know what, it'd be nice if you could do all the exercises in pure Crystal, so I'll take my time and try to implement what's necessary". And the first obvious need was to have a nice and efficient implementation of matrices.
@spalladino, another guy from Manas, suggested that we used LAPACK and BLAS so I wrote the bindings and some basic operations. Then we benchmarked against Numpy and we got these results: http://manas.com.ar/blog/2015/10/30/linear-algebra-in-crystal-from-lapack.html. It turned out to be a good idea!
As you can see, there's no roadmap. When I have some spare time and the mental energy to program, I go back to the tutorial and check what's the next thing to implement in Crystal, and just do that.
I guess one interesting next step would be to port all the functionality we already have in Matrix to an n-dimensional array implementation
That said, definitely keep on bringing roadblocks and concrete use cases! It's a lot nicer to shape the library based on real world problems.
This is super relevant to understand some of the implications of using LAPACK et al: https://scottlocklin.wordpress.com/2016/05/26/numeric-linear-algebra-code-an-appreciation/
That said, definitely keep on bringing roadblocks and concrete use cases! It's a lot nicer to shape the library based on real world problems.
Absolutely! I'd suggest trying to pick a problem at a time, and implement the required functionality to support it. MNIST using PCA was my first (and only, so far) test subject, but it could be interesting to pick other solutions and implement them as well.
@g3ortega Not sure if you're still interested, but a quick update on this: @MatthiasRMS is working on an initial NdArray implementation. We're discussing the details here: https://github.com/mverzilli/crystalla/pull/15
Feel free to chip in!
Hi! Is work on this package continuing, or is there a replacement package? Thanks!
Hi! At the moment I'm not putting time into this. You can check the discussions here to get some candidate replacements: https://github.com/crystal-community/crystal-libraries-needed/issues/15
Hi, I have started to play with Crystal. For now, I'm trying to migrate some scripts I had in ruby with some algorithms implementations (about all ML algorithms). I could migrate some matrices using this library, but I couldn't find support for Vector class in Crystal and neither crystalla.
http://ruby-doc.org/stdlib-2.2.1/libdoc/matrix/rdoc/Vector.html
Can you help me? I am willing to contribute with these kind of libraries when I have enough knowledge and experience with Crystal , I think it has an enormous potential for Data Science, and I eventually want to use it in some real data oriented projects 😄