mickeysjm / TaxoExpan

The source code used for self-supervised taxonomy expansion method TaxoExpan, published in WWW 2020
Apache License 2.0
75 stars 18 forks source link

use the high accuracy train model to test while the result of test is so bad with pyfile named test_fast.py ! #4

Closed HAOYUANJIE123 closed 3 years ago

HAOYUANJIE123 commented 3 years ago

Thanks for offering the novel idea and code. while experiment ,we have a problem. we use the datasets of paper or our small datasets,and set test datasets the same as val datasets. The result all have a wrong that the accuracy of val dataset is high but the accuracy of test dataset is low with test_fast.py.

HAOYUANJIE123 commented 3 years ago

提问

mickeysjm commented 3 years ago

Thanks for your interest in this work.

The reason for this observation is that the candidate set of potential insertion places (for a new concept) is much smaller during the validation stage than the test stage. Specifically, if you use the default config file, the negative size is set to 256 in the validation stage. That means the potential insertion places for each new concept is 1+256=257. However, during the test stage, the potential insertion places are all the existing nodes in the taxonomy, which is usually much larger than 257.

This strategy (in the validation stage) essentially uses the performance of "select 1 correct position out of randomly sample 257 candidate positions" to approximate the performance of "select 1 correct position out of all nodes in the existing taxonomy". By doing so, we can get the validation results faster, and empirically we find the validation metric is highly correlated with the final testing performance.

Hope this answers your questions and if you have any further questions, please comment here. Thanks.

mickeysjm commented 3 years ago

Hi Yuanjie,

The numbers reported in the paper are the average results over 5 runs. The results in your screenshot are actually even better than the average results reported in the paper.

Thanks, --Jiaming

On Mon, Jun 7, 2021 at 1:38 AM HAOYUANJIE123 @.***> wrote:

Thanks for your reply.I still have a question. I experiment the dataset and code of your paper with config.mag. json, it still has the same observation.How do I set the parameters to get good test result accuracy same to your paper? [image: 捕获] https://urldefense.com/v3/__https://user-images.githubusercontent.com/49578663/120970406-fb02f500-c79d-11eb-8236-7fad408e27cf.PNG__;!!DZ3fjg!srATyCc1714T0PBTz5JViFTxybGZrAb0SsJAsr9frt-m6wdFE_zck7bHw-lZdMQ$

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/mickeystroller/TaxoExpan/issues/4*issuecomment-855633077__;Iw!!DZ3fjg!srATyCc1714T0PBTz5JViFTxybGZrAb0SsJAsr9frt-m6wdFE_zck7bHHwOHdTk$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABHPNS3QYT2EECV2VL4BFQLTRRSPTANCNFSM4556BMEQ__;!!DZ3fjg!srATyCc1714T0PBTz5JViFTxybGZrAb0SsJAsr9frt-m6wdFE_zck7bHAnr_aOI$ .

--

--

Sincerely,


Jiaming Shen, Ph.D. candidate

Office: Room 2119B, Thomas M. Siebel Center, UIUC.

Email: @.***

Homepage: http://mickeystroller.github.io