Improve portability of PACRR code, clarify and update documentation

AbhinavMadahar commented 6 years ago

Hello Dr Andrew Yates and Kai Hui,

I made the code avoid filenames longer than 256 chars, which makes it crash on Linux, macOS, and Windows. I was able to successfully run train_model, pred_per_epoch, and evals.

In addition, I clarified some of the documentation and updated it to include DE- and CO-PACRR.

I also included an install script that installs most of the required packages.

This is the output I got from evals:

parentdir: output
Directory of the similarity matrices: simmat
WARNING - metrics - No observers have been added to this run
INFO - metrics - Running command 'main'
INFO - metrics - Started
Using TensorFlow backend.
Enter pred_dir or leave blank to use default: output/train_wt09_10/pacrrpub/predict_per_epoch/test_wt10/exp2
Enter val_dir or leave blank to use default: output/train_wt09_10/pacrrpub/predict_per_epoch/test_wt10/exp2
INFO - main - evaluate PACRR_simdim-800_epochs-10_nsamples-2048_maxqlen-16_binmat-False_numneg-6_batch-32_distill-firstk_winlen-3_nfilter-32_kmaxpool-3_combine-16_qproximity-0_context-False_shuffle-False_xfilters-_cascade- on wt10 based on val wt14                     over docpairs benchmark and output to output/train_wt09_11_12_13/pacrrpub/evaluations/statdocpair/wt09-11-12-13_v-14_t-10/PACRR_simdim-800_epochs-10_nsamples-2048_maxqlen-16_binmat-False_numneg-6_batch-32_distill-firstk_winlen-3_nfilter-32_kmaxpool-3_combine-16_qproximity-0_context-False_shuffle-False_xfilters-_cascade-
INFO - main -
     HRel-NRel  HRel-Rel      Rel-NRel
 51        0.0       0.0  0.000000e+00
 52        0.0       0.0  0.000000e+00
 53        0.0       0.0  0.000000e+00
 54        0.0       0.0  0.000000e+00
 55        0.0       0.0  0.000000e+00
 56        0.0       0.0  0.000000e+00
 57        0.0       0.0  0.000000e+00
 58        0.0       0.0  0.000000e+00
 59        0.0       0.0  0.000000e+00
 60        0.0       0.0  5.813953e-03
 61        0.0       0.0  0.000000e+00
 62        0.0       0.0  0.000000e+00
 63        0.0       0.0  0.000000e+00
 64        0.0       0.0  0.000000e+00
 65        0.0       0.0  0.000000e+00
 66        0.0       0.0  0.000000e+00
 67        0.0       0.0  0.000000e+00
 68        0.0       0.0  0.000000e+00
 69        0.0       0.0  0.000000e+00
 70        0.0       0.0  0.000000e+00
 71        0.0       0.0  0.000000e+00
 72        0.0       0.0  0.000000e+00
 73        0.0       0.0  0.000000e+00
 74        0.0       0.0  3.571429e-02
 75        0.0       0.0  0.000000e+00
 76        0.0       0.0  0.000000e+00
 77        0.0       0.0  0.000000e+00
 78        0.0       0.0  0.000000e+00
 79        0.0       0.0  0.000000e+00
 80        NaN       NaN  3.571429e-03
 81        0.0       0.0  0.000000e+00
 82        0.0       0.0  0.000000e+00
 83        0.0       0.0  0.000000e+00
 84        0.0       0.0  0.000000e+00
 85        0.0       0.0  8.474576e-03
 86        0.0       0.0  0.000000e+00
 87        0.0       0.0  0.000000e+00
 88        0.0       0.0  0.000000e+00
 89        0.0       0.0  7.407407e-03
 90        0.0       0.0  4.347826e-02
 91        0.0       0.0  0.000000e+00
 92        0.0       0.0  0.000000e+00
 93        0.0       0.0  0.000000e+00
 94        0.0       0.0  0.000000e+00
 96        0.0       0.0  0.000000e+00
 97        0.0       0.0  0.000000e+00
 98        0.0       0.0  0.000000e+00
 99        0.0       0.0  0.000000e+00
 0         0.0       0.0  2.176248e-03
-1    413552.0  110713.0  1.288002e+06
Original output filename too long. Enter new filename: output-exp2-1
INFO - main - finished PACRR_simdim-800_epochs-10_nsamples-2048_maxqlen-16_binmat-False_numneg-6_batch-32_distill-firstk_winlen-3_nfilter-32_kmaxpool-3_combine-16_qproximity-0_context-False_shuffle-False_xfilters-_cascade- wt09_11_12_13 wt14 wt10

Is there anything else that I should do, or is the code ready to be merged back? Also, on which OS was the model originally run?

khui commented 6 years ago

@AbhinavMadahar

Thank you for your works (mostly) on the documents, and sorry for this delay. The code is originally run on Debian. We are working on a new version of the code with Docker. Please keep tuned and you could start a pull request there.

At this moment, unfortunately, we can not accept this pull request.

AbhinavMadahar commented 6 years ago

Dear Dr.-Ing Hui,

OK, I will continue to work for Dr. de Melo's other projects in the meantime.

On Tue, Feb 6, 2018 at 2:24 AM, Kai Hui notifications@github.com wrote:

@AbhinavMadahar https://github.com/abhinavmadahar

Thank you for your works (mostly) on the documents, and sorry for this delay. The code is originally run on Debian. We are working on a new version of the code with Docker. Please keep tuned and you could start a pull request there.

At this moment, unfortunately, we can not accept this pull request.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/khui/repacrr/pull/7#issuecomment-363333628, or mute the thread https://github.com/notifications/unsubscribe-auth/AH7WEK99S74-Dd2rj-TtdDTSXWSpn13Dks5tR_4agaJpZM4RNurP .

-- Abhinav Madahar Undergraduate, Rutgers University abhinav.madahar@rutgers.edu abhinavmadahar.com

khui / copacrr

Improve portability of PACRR code, clarify and update documentation #7