lukalabs / cakechat

CakeChat: Emotional Generative Dialog System
Apache License 2.0
1.7k stars 935 forks source link

Python 3 Compatibility #10

Closed oxy closed 6 years ago

oxy commented 6 years ago

Based on the work done by @cclauss to modernize python 2 code, this PR adds compatibility for python 3 with six, and makes changes to add python 3 compatibility without breaking python 2. Tested on 2.7.14 and 3.6.4 on macOS. This adds on lukalabs/cakechat#1.

Tested, working:

cclauss commented 6 years ago

Nice!! Could you please run (first one is Python 2 and the second Python 3):

if you have not already done so the make sure there and no more syntax or naming issues. I did not performed these tests so perhaps all is perfect already.

oxy commented 6 years ago

@cclauss Both commands output 0.

nikitos9000 commented 6 years ago

@Oxylibrium sorry for a delayed review — we just got enough time to test these changes comprehensively.

At first, there's a need to modify dockerfiles to support python3 and pip3. You can simply replace all 'python' and 'pip' invocations with python3 and pip3 respectively.

Then, there are still some consequences of changes in 'map' and 'filter' return types — return values doesn't support 'getitem' and 'len' methods anymode.

Also, check the division operators — the semantics of '/' has changed, in py3 it returns float values, you can simply replace it with '//'.

Here is the checklist:

Thanks for the great contribution!

oxy commented 6 years ago

The 'map' object is not subscriptable seems to be hiding several larger issues with the code that's responsible for loading and tokenizing the data. I'm playing with a different approach using generators across the board on Oxylibrium/cakechat#py3-refactor, but I'm not sure how long it'll take for it to be ready to merge.

nikitos9000 commented 6 years ago

@Oxylibrium I think you can simply use list(map(...)) or six.map primitives to make it works without rewriting another parts of the code

cclauss commented 6 years ago

int(float) and list(filter()) should fix the others.

oxy commented 6 years ago

I accidentally forgot to commit my code that used list(map) and islice wherever needed in an attempt to fix tools/generate_predictions.py, but python3 still fails.

Traceback (most recent call last):
  File "tools/generate_predictions.py", line 172, in <module>
    predict(args.pop('model'), args.pop('tokens_index'), args.pop('conditions_index'), args.pop('output'),**args)
  File "tools/generate_predictions.py", line 107, in predict
    _save_test_results(processed_test_set, cur_path, nn_model, prediction_mode, **cur_params)
  File "tools/generate_predictions.py", line 32, in _save_test_results
    load_context_free_val(nn_model.token_to_index))
  File "/Users/oxy/Documents/Code/cakechat/cakechat/utils/dataset_loader.py", line 52, in load_context_free_val
    x_validation, y_validation, _ = transform_lines_to_nn_input(tokenized_validation_lines, token_to_index)
  File "/Users/oxy/Documents/Code/cakechat/cakechat/dialog_model/model_utils.py", line 392, in transform_lines_to_nn_input
    x_data_iterator, token_to_index, INPUT_SEQUENCE_LENGTH, INPUT_CONTEXT_SIZE, max_contexts_num=n_dialogs)
  File "/Users/oxy/Documents/Code/cakechat/cakechat/dialog_model/model_utils.py", line 82, in transform_contexts_to_token_ids
    if token in token_to_index else token_to_index[SPECIAL_TOKENS.UNKNOWN_TOKEN]
TypeError: unhashable type: 'list'

Unfortunately, I've been stuck and unable to find the source of this issue, and that's why I'm trying to more aggressively refactor the code.

cclauss commented 6 years ago

My recommendation would be to make sure that

  1. the flake8 test pass on both Python 2 and Python 3
  2. the program runs properly on Python 2
  3. check in these changes with the proviso that Python 3 is not yet supported

The Docker changes, etc. can come after some subset of users have tested Python 3 enough that we are highly confident that the port is complete before making the Docker changes.

cclauss commented 6 years ago

The current code base has at least 5 SyntaxErrors and 34 undefined names in Python 3. This makes it difficult to collaboratively close the final porting issues.

nikitos9000 commented 6 years ago

@Oxylibrium the problem is in file_buffered_tee function — pickle in python3 with protocol 4 (HIGHEST_PROTOCOL) behaves in a glitchy way for some reason — I recommend to change the protocol to 2 (compatible both with py2 and py3) — on this line — https://github.com/lukalabs/cakechat/pull/10/files#diff-65e57c2d04c73a63fa7763ca487562ecR9.

oxy commented 6 years ago

With these commits, I've tested all the tools that failed with Exceptions, and they complete successfully on my test machine. However, I'd still recommend keeping Docker on Python 2 until we have a long enough time without significant issues on Python 3.

nikitos9000 commented 6 years ago

@Oxylibrium Great! I think you can just create a duplicate Dockerfiles adapter for py3, that would be very useful for everyone

oxy commented 6 years ago

@nsmetanin I've tested it, and in retrospect, python 3 does seem rather stable, so I just switched the Dockerfiles to using python 3. Anything else to do before merge?

nikitos9000 commented 6 years ago

@Oxylibrium maybe it's better to keep two separate versions of Dockerfiles — keep the original one for py2 and something like Dockerfile3 for py3

cclauss commented 6 years ago

What about DockerfileLegacy for legacy Python and Dockerfile for Python 3?

nikitos9000 commented 6 years ago

@cclauss I would prefer the naming to follow 'python' and 'python3' style, so it's better to leave it Dockerfile and Dockerfile3 together

nikitos9000 commented 6 years ago

tools/quality/ranking_quality.py still doesn't work (map usage is still broken somewhere) :(

oxy commented 6 years ago

Oops, found the issue in testing today but a power outage meant I couldn't push the code. Will do first thing in the morning before I head to school.

On Wed, Jun 6, 2018, 19:35 Nikita Smetanin notifications@github.com wrote:

tools/quality/ranking_quality.py still doesn't work (map usage is still broken somewhere) :(

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lukalabs/cakechat/pull/10#issuecomment-395080863, or mute the thread https://github.com/notifications/unsubscribe-auth/ASmWu1NhnJToQu1O5tUwLOjkx5OYgFgpks5t5-G9gaJpZM4Sf98Y .

nikitos9000 commented 6 years ago

Another error on the same script (tools/quality/ranking_quality.py) occurs:

[07.06.2018 10:40:36.176][INFO][1929][cakechat.dialog_model.model][628] Model path is /root/cakechat/data/nn_models/processed_dialogs_gru_hd512_drop0.2_encd2_decd2_il30_cs3_ansl32_lr1.0_gc_5.0_learnemb_cdim128_window10_voc50000_vec128_sgTrue
Traceback (most recent call last):
  File "tools/quality/ranking_quality.py", line 84, in <module>
    _compute_metrics(nn_model, testset)
  File "tools/quality/ranking_quality.py", line 73, in _compute_metrics
    compute_recall_k, testset, context_to_weighted_responses, top_count=test_set_size / 4)
  File "/root/cakechat/cakechat/dialog_model/quality/metrics/ranking.py", line 33, in compute_retrieval_metric_mean
    for question, answers in questions_answers.items()
  File "/root/cakechat/cakechat/dialog_model/quality/metrics/ranking.py", line 33, in <listcomp>
    for question, answers in questions_answers.items()
  File "/root/cakechat/cakechat/dialog_model/quality/metrics/ranking.py", line 21, in compute_recall_k
    weighted_actual_answers.keys(), key=lambda response: weighted_actual_answers[response], reverse=True)[:k]
TypeError: slice indices must be integers or None or have an __index__ method
oxy commented 6 years ago

I finally managed to fix my GPU (and libgpuarray) so that I can run tools/quality/ranking_quality.py to completion (I stopped it after waiting about a whole day for it to complete on my old computer and assumed it'd work, my bad), and I seem to have finally got it working.

...(snip)
Test set size = 486
mean_ap = 0.06444541857883838
mean_recall@10 = 0.11787819253438114
mean_recall@25% = 0.3968565815324165
nikitos9000 commented 6 years ago

Another problem:

Can't build GPU-backed Dockerfile for python3 with sudo docker build -t cakechat3-gpu:latest -f dockerfiles/Dockerfile3.gpu dockerfiles/: fails with python: not found error somewhere in cmake

oxy commented 6 years ago

Fixed the issue over the weekend, and just installed docker on my dev machine. Should work now, I hope.

cclauss commented 6 years ago

Nice work @Oxylibrium !!