google / trax

Trax — Deep Learning with Clear Code and Speed
Apache License 2.0
8.07k stars 813 forks source link

making the Terraformer predict script compatible with scipap dimensions. #1674

Closed copybara-service[bot] closed 3 years ago

copybara-service[bot] commented 3 years ago

making the Terraformer predict script compatible with scipap dimensions.

lukaszkaiser commented 3 years ago

With the changes in this PR, the script still runs LSHAttention for long sequences - and that'll be affected by input padding as it's not patched in the concat like Sebastian did for standard attention. This should be helped by setting MixedLSHAttention.std_length to something large (say 16384 or more), maybe worth giving a try.

henrykmichalewski commented 3 years ago

It does not help - I tried it this afternoon. Just in case I will re-run it.

On Wed, Jul 21, 2021 at 10:09 PM Lukasz Kaiser @.***> wrote:

With the changes in this PR, the script still runs LSHAttention for long sequences - and that'll be affected by input padding as it's not patched in the concat like Sebastian did for standard attention. This should be helped by setting MixedLSHAttention.std_length to something large (say 16384 or more), maybe worth giving a try.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/google/trax/pull/1674#issuecomment-884501275, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZDJ7V6Z6XDZNTDLDMDMALTY4ZPNANCNFSM5AWN2H7Q .

-- http://www.mimuw.edu.pl/~henrykm

lukaszkaiser commented 3 years ago

It didn't when done in the script, but then I changed it directly in the downloaded config.gin file for both encoder and decoder - and to 32000 to be safe. That seems to help: I still get repetitions, but at least it starts looking reasonable. 6s/tokens is more friendly too. (This is with padding)