lalalune / arcprize

33 stars 4 forks source link

data leakage? #2

Closed srikanthsrnvs closed 2 weeks ago

srikanthsrnvs commented 2 weeks ago
src = src.to(device)
output = model(src)
target = src[:, model.num_context_tokens:].reshape(-1)
loss = criterion(output.view(-1, num_tokens + 1), target.view(-1))

Your model looks at the entire source?

lalalune commented 2 weeks ago

good catch, reviewing

srikanthsrnvs commented 2 weeks ago

even so, I think you have the right idea here - I want to try this challenge now!

lalalune commented 2 weeks ago

Splitting the prize evenly with anyone who fixes it and gets us there, if you want to join :) I'm spatialweeb on twitter

srikanthsrnvs commented 2 weeks ago

yeah I dmed you - im gonna take a crack at it tonight

lalalune commented 2 weeks ago

I tried fixing this. I am training another model.

The attention mask is a very simple upper triangular, but we also had to have a correct padding mask since we have so many padding tokens. I might had gotten these mixed up a bit.

I think most of the bugs are in the eval portion now. i.e. if there is data leakage it is because the eval data is in the gpu and it's not forward masking. Someone also suggested it could be teacher forcing-- every wrong prediction is corrected for the next, so strings can't go wildly off. That seems fine but not true to the spirit of the challenge.

utvompl commented 2 weeks ago

interesting approach, probably a simple/dumb q but any plans to add cross-validation/overlap checking?

lalalune commented 2 weeks ago

Okay, I fixed this, now we're not converging

srikanthsrnvs commented 2 weeks ago

Haha, yep I’m already working on a variant where I use llama to steer in latent space and create a custom flattener that doesn’t use Hilbert curves

Srikanth


From: M̵̞̗̝̼̅̏̎͝Ȯ̴̝̻̊̃̋̀Õ̷̼͋N̸̩̿͜ ̶̜̠̹̼̩͒ @.> Sent: Thursday, June 20, 2024 10:48:26 PM To: lalalune/arcprize @.> Cc: Srikanth Srinivas @.>; Author @.> Subject: Re: [lalalune/arcprize] data leakage? (Issue #2)

Okay, I fixed this, now we're not converging

— Reply to this email directly, view it on GitHubhttps://github.com/lalalune/arcprize/issues/2#issuecomment-2182045241, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGB2H5SHXC4RLGZVPU2J72DZIO5CVAVCNFSM6AAAAABJVARFEOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBSGA2DKMRUGE. You are receiving this because you authored the thread.Message ID: @.***>

lalalune commented 2 weeks ago

Here's where I'm struggling conceptually.

We have an autoregressive transformer with a forward attention mask that is pushing all forward predictions to -infinity. I asked ChatGPT and it said I was doing the masking wrong and suggested a simpler fix which looked right-- simply upper right triangular.