quinngroup / dr1dl-pyspark

Dictionary Learning in PySpark
Apache License 2.0
1 stars 1 forks source link

the STD:: #7

Closed MOJTABAFA closed 8 years ago

MOJTABAFA commented 8 years ago

Dear xiang :

Thanks for your prev. comment , Now I found that in function op_selectTopR we're trying to create a new vector in which the value of elements are greater than N Am I right ?

if yes, would you please let me know whats the role of following instruction? std::nth_element(tmp.begin(), tmp.begin()+R, tmp.end(), std::greater());

magsol commented 8 years ago

http://en.cppreference.com/w/cpp/algorithm/nth_element

MOJTABAFA commented 8 years ago

Thanks I got it that this instruction used to sort the vector , in numpy we have "numpy.argsort" . shall I use the same function with quick sort for this purpose?

XiangLi-Shaun commented 8 years ago

No, the new vector contains the indices of the elements in v with values greater than some value conducted from R (not N, N is the length of input vector v), not the value themselves.

for std::nth_element(tmp.begin(), tmp.begin()+R, tmp.end(), std::greater()), tmp is initialized to be the same with v; by performing the nth_element, the Rth element in tmp is then the same with the Rth element in the sorted v (descending). Thus we can find the Rth largest value in v.

Also, please don't use sort, it's algorithmically much slower than nth_element.

magsol commented 8 years ago

This may be more appropriate: http://docs.scipy.org/doc/numpy/reference/generated/numpy.argpartition.html

iPhone'd

On Nov 21, 2015, at 17:17, LindberghLi notifications@github.com wrote:

No, the new vector contains the indices of the elements in v with values greater than some value conducted from R (not N, N is the length of input vector v), not the value themselves.

for std::nth_element(tmp.begin(), tmp.begin()+R, tmp.end(), std::greater()), tmp is initialized to be the same with v; by performing the nth_element, the Rth element in tmp is then the same with the Rth element in the sorted v (descending). Thus we can find the Rth largest value in v.

Also, please don't use sort, it's algorithmically much slower than nth_element.

— Reply to this email directly or view it on GitHub.

MOJTABAFA commented 8 years ago

so at first you will find the nth greatest value with (std::nthgreatest ...) Then you try to find the indices of greater values and save the indices in idxs_n Vector? There I should find an equivalent function in pynum to do the nth_element... in python that should be probably "numpy.argpartition"

MOJTABAFA commented 8 years ago

Thanks Dr. Quinn

MOJTABAFA commented 8 years ago

Thanks Xiang

MOJTABAFA commented 8 years ago

Actually I already tested the following code for test which seems to be plausible :

test1 = np.array([19,1,0,0,7,6,2,5,6,0,4,3]) temp = np.argpartition(-test1, 5) resultargs = temp[:5] temp = np.partition(-test1, 5) resultvalues = -temp[:5] print(resultargs) print(resultvalues) ==================={ Output }======= [4 0 8 5 7] [ 7 19 6 6 5] [Finished in 0.3s]

MOJTABAFA commented 8 years ago

Dear Xiang :

in case of having R and Vct_input as follows , do you think the out put is correct for that function? (considering the point that the 1st argument index in python is "0":

R=5 vct_input = np.array([19,1,0,0,7,6,2,5,6,0,4,3])

temp = np.argpartition(-vct_input, R) idxs_n = temp[:R] print(idxs_n) ================{ Output }====== [4 0 8 5 7] [Finished in 0.3s]

XiangLi-Shaun commented 8 years ago

Sorry I need more information in answering your question. Also the python part is not on my job.

MOJTABAFA commented 8 years ago

yes, you're right . But you do not consider the codes , only consider the input vector , R and out put vector which is indices of Rth greatest values: [4 0 8 5 7] . I want to know that based on your logic in your program, based on given values , having the mentioned output is desirable or not ? sorry for misunderstanding .

XiangLi-Shaun commented 8 years ago

Yes the output is right.

MOJTABAFA commented 8 years ago

Thanks Xiang

MOJTABAFA commented 8 years ago

This function is also converted as follows , so this issue will be closed. Thanks for all your patients and supports.

def op_selectTopR( vct_input, idxs_n, R): temp = np.argpartition(-vct_input, R) idxs_n = temp[:R] return (idxs_n)