Open ritwik12 opened 5 years ago
@omerdagan84 Please have a look
i'm not sure we can avoid some of these loops since all categories must be scanned, But i can suggest moving the scoring internal loop to a function and creating threads to score each class, this will definitely lower the time complexity (However this will not improve the code complexity)
We need to figure out some way or change the approach, That's tricky here
Hey there, I just landed on this page from up-for-grabs.net, so I don't know much about this project. From what I understood from the code snipet, you are searching words across categories, right. Wouldn't a tree structure help with the time complexity. You could use a Radix tree, and the node could be something like this:
typedef struct {
Node* left; // left child
Node* right; // right child
char * key;
Category* categories; // let's assume that this is the name of the class/type
} Node;
I hope this comment helped a bit.
@mcavazotti Hey, thanks for taking interest. good to hear you landed from up-for-grabs.net, I am there too :P
Radix tree would just be a DS to store those words instead of values. While iterating through them it will still be the same.
Can you please share an example for this. You can see here for analysis part with which we assign categories to the input sentence. Categories are defined here in define.h
Radix tree might be of use for defining categories I suppose, is there a way to help in reducing complexity with that?
While working on it, I thought about other algorithms. The main difference comes down to memory usage. Being said that, where do you intend to run this assistant (Raspberry PI, personal computer, etc)? And how many phrases/words do you expect to classify (thousands, 10k, 1M)?
And will there be insertions of new words/phrases at runtime?
@mcavazotti For now this is intended to Systems only mainly PC, Laptops. Also, about the phrases it should not be more than 50-100 words even if you are searching something on the internet. We can always restrict the max. use of phrases as this project is intented towards being a Virtual assistant and performaning the tasks which should not have that much keywords.
At runtime it is just the insertion of user input/ query/ command to give the Virtual assistant and perform the task. If by phrases you meant the Knowledge base/ Corpus then that is limited to 5 for now I guess and it is not being inserted at run time.
I think you can close this issue
@mcavazotti The code is pretty huge and there are still many Complexity improvements and a lot of which are not yet been discovered. Don't think we can close this.
@ritwik12 Well then, for the code mentioned in this issue, the time complexity was optimized.
I can look further into the code, but do you have any other leads to where there might be other performance bottlenecks?
There are certain cases where we can work on Reducing the time complexity of code.
One such instance is: https://github.com/ritwik12/Virtual-Assistant/blob/2712a8917b4480e96350af33e58fd1e59c9192f2/analysis.c#L39