whaleloops / KEPT

auto icd coding with prompt
MIT License
46 stars 17 forks source link

Question on token max limit (8,192 vs. 4,096) #2

Closed mgh1 closed 1 year ago

mgh1 commented 1 year ago

Dear @whaleloops ,

In your excellent paper, you mentioned you use the Clinical Longformer from Li et al., 2022. In several places in the paper you mentioned the max token limit is 8,192 tokens, however the Clinical Longformer from Li says the max limit is 4,096 tokens.

Please explain this gap (8,192 vs. 4,096). Thank you.

whaleloops commented 1 year ago

Good question! You could use the allenai original or our script to change the max token limit, by changing max_pos=4096 to max_pos=16384 (or even more).

FYI. Our pretrained model used 16384.

mgh1 commented 1 year ago

Thank you @whaleloops !