Handling EOS Token in Max Sequence Length Scenarios

qixuema commented 9 months ago

Hi Phil,

Thank you for your valuable contributions to this project!

I'm encountering a problem with handling long sequences here, where the maximum sequence length is set to 2048. The issue arises when processing a batch of code samples; if the iteration reaches the maximum sequence length without appending an eos_token_id at the end of all samples, the subsequent code block that depend on this are skipped.

mask = is_eos_codes.float().cumsum(dim = -1) >= 1
codes = codes.masked_fill(mask, self.pad_id)
break

This results in the codes, inputted into later stages of the process, still containing an eos_token_id. This could potentially lead to errors in operations such as gather that follow.

Best regards, Xueqi

lucidrains commented 9 months ago

@qixuema oh yes indeed, should be fixed in the latest commit

i take it you must be seeing some positive results on your end?

qixuema commented 9 months ago

@lucidrains I'm really sorry, but I haven't focused much on the triangle mesh, so I may not be able to answer how the results are regarding the mesh at the moment.

If you think it's necessary, I could consider conducting some tests on the mesh in the near future because, in any case, directly generating a mesh is an exciting research work.

Lastly, thank you very much for your work!

lucidrains commented 9 months ago

oh! no problem

assumed you had already gotten to the generation stage

in any event, thanks for finding this issue!

qixuema commented 9 months ago

Thank you for your contributions to this project! Your work is greatly appreciated.

lucidrains / meshgpt-pytorch

Handling EOS Token in Max Sequence Length Scenarios #30