tdebatty / spark-knn-graphs

Spark algorithms for building k-nn graphs
MIT License
42 stars 15 forks source link

Providing initial neighbors list #13

Closed fvictorio closed 6 years ago

fvictorio commented 6 years ago

Hi, is there some way to provide the initial list of neighbors (specifically for the NNDescent implementation)? There doesn't seem to be a method for this, but maybe I'm missing something.

If it's not possible, would you be willing to accept a PR implementing it? As far as I can tell, it would just mean adding an overloaded method computeGraph(JavaRDD<Node<T>> nodes, JavaPairRDD<Node<T>, NeighborList> initialGraph) and not doing the random initialization. At least for NNDescent, I'm not sure about the other subclasses of AbstractPartitioningBuilder.

Thanks!

tdebatty commented 6 years ago

Hi,

Thank you for your interest!

Indeed, there is no such method currently, but PR are welcome...

fvictorio commented 6 years ago

Hi, could you give me some pointers on how to build the project?

From what I see, there are two differente sub-projects, spark-knn-graphs and spark-knn-graphs-eval, and the second one depends on the first.

Inside spark-knn-graphs I can do mvn package and the JAR is properly built. But if I try to package the spark-knn-graphs-eval project, I get an error:

[ERROR] Failed to execute goal on project spark-knn-graphs-eval: 
Could not resolve dependencies for project 
info.debatty:spark-knn-graphs-eval:jar:0.1-SNAPSHOT: Could not 
find artifact info.debatty:spark-knn-graphs:jar:0.16-SNAPSHOT -> [Help 1]

I would appreciate it if you could tell me what's your workflow for this. Thanks!

tdebatty commented 6 years ago

Hi,

spark-knn-graphs is the main project spark-knn-graphs-eval contains the tests I run for some papers I published, so you shouldn't need it...

If you still want to build spark-knn-graphs-eval you have to:

cd spark-knn-graphs mvn clean install cd ../spark-knn-graphs-eval mvn clean package

mvn clean install will install the .jar in your local maven cache, so it's available for other maven projects...

Le sam. 7 avr. 2018 à 16:36, Franco Victorio notifications@github.com a écrit :

Hi, could you give me some pointers on how to build the project?

From what I see, there are two differente sub-projects, spark-knn-graphs and spark-knn-graphs-eval, and the second one depends on the first.

Inside spark-knn-graphs I can do mvn package and the JAR is properly built. But if I try to package the spark-knn-graphs-eval project, I get an error:

[ERROR] Failed to execute goal on project spark-knn-graphs-eval: Could not resolve dependencies for project info.debatty:spark-knn-graphs-eval:jar:0.1-SNAPSHOT: Could not find artifact info.debatty:spark-knn-graphs:jar:0.16-SNAPSHOT -> [Help 1]

I would appreciate it if you could tell me what's your workflow for this. Thanks!

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/tdebatty/spark-knn-graphs/issues/13#issuecomment-379473923, or mute the thread https://github.com/notifications/unsubscribe-auth/AA1SDGLzXCW-lsSb8cJ1MAiElvSsVE2mks5tmM7zgaJpZM4S55-m .

fvictorio commented 6 years ago

Hi, thanks for the answer. You were right, of course. I didn't need spark-knn-graphs-eval. My issue was because I built the JAR with the last version (0.16-SNAPSHOT) and included it in another project, but I wasn't including the transitive dependencies (java-graphs, etc.)

I adapted my code to the new interfaces as a previous step to take a shot at implementing the initial neighbor list. But I found that the results I got were different to the ones I got with version 0.15 (I know that NNDescent is not a deterministic algorithm, but even taking that into account, the results were way off).

So my question is: how "ready" is this 0.16 version (I mean, the code in the master branch)? Are there some significant changes I should take into account?

My other option is just to fork from 0.15 with the feature I need, but I'd prefer to contribute to the project than doing some throwaway fork.

Thanks!

tdebatty commented 6 years ago

Hi,

I would recommend you implement your changes against version 0.15. The next one is still under work...

T.

Le mar. 17 avr. 2018 à 22:51, Franco Victorio notifications@github.com a écrit :

Hi, thanks for the answer. You were right, of course. I didn't need spark-knn-graphs-eval. My issue was because I built the JAR with the last version (0.16-SNAPSHOT) and included it in another project, but I wasn't including the transitive dependencies (java-graphs, etc.)

I adapted my code to the new interfaces as a previous step to take a shot at implementing the initial neighbor list. But I found that the results I got were different to the ones I got with version 0.15 (I know that NNDescent is not a deterministic algorithm, but even taking that into account, the results were way off).

So my question is: how "ready" is this 0.16 version (I mean, the code in the master branch)? Are there some significant changes I should take into account?

My other option is just to fork from 0.15 with the feature I need, but I'd prefer to contribute to the project than doing some throwaway fork.

Thanks!

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/tdebatty/spark-knn-graphs/issues/13#issuecomment-382141327, or mute the thread https://github.com/notifications/unsubscribe-auth/AA1SDOqlxs1wdOvPrd1e98EhR3aUSz2qks5tplXfgaJpZM4S55-m .

fvictorio commented 6 years ago

Cool, thanks for all the help!