Network Division GPT-j input tensor transfer

mlcommons / inference

Reference implementations of MLPerf™ inference benchmarks

https://mlcommons.org/en/groups/inference

Apache License 2.0

1.19k stars 519 forks source link

Network Division GPT-j input tensor transfer #1424

Closed nv-jinhosuh closed 1 year ago

nv-jinhosuh commented 1 year ago

Hi,

In v3.0, for BERT, we allowed LoadGen node to send InputTokenIDs tensors only and let SUT to reconstruct SegmentIDs/AttentionMask tensors.

Do we allow the similar thing in GPT-j for v3.1? I can see the same thing could be done.

@liorkhe FYI; also Lior: do you happen to know Gavin's handle from KRAI?

liorkhe commented 1 year ago

Hi, The main reason as far as I recall was to allow the test to be similar to the server mode, since the BERT inputs were partitioned at a certain size so when sending the input and partitioning at the server created different inputs/outputs. If we have the same issue with GPT-J it makes sense, I think. The issue then was https://github.com/mlcommons/inference/issues/1291. As far as I know from that issue Gavin is https://github.com/G4V

nv-jinhosuh commented 1 year ago

Thanks Lior! Yes I see GPT-j tensors are similarly preprocessed. Would you help bringing this up in WG? I guess we may need to touch up the policy doc (https://github.com/mlcommons/inference_policies/blob/master/inference_rules.adoc#benchmarks-and-qsl-preprocessing)?

liorkhe commented 1 year ago

yes. but isn't https://github.com/mlcommons/inference_policies/pull/275 already covering it?

nv-jinhosuh commented 1 year ago

Oh I didn't know this policy update. :) Looks like it's already covering it! Thank you @liorkhe !

nv-jinhosuh commented 1 year ago

Closing this.