Closed xjtupanda closed 1 year ago
Yes you are right! Although getting token embeddings is usually not expensive. But yes, it will improve the efficiency, given they are the same. Thanks for catching these! we will take a note of this for our next update.
Good code and is easy to understand!
These two actually are the same, I'm not sure why it's called twice. I think copy operation might work as well and save some computation. https://github.com/rohitgandikota/erasing/blob/724514a05b55e047dc783d1e42d444989230620d/train-scripts/train-esd.py#L221-L222
These two lines are unnecessary. I've checked it and their requires_grad property is already set to False because of the previous
with torch.no_grad()
context. https://github.com/rohitgandikota/erasing/blob/724514a05b55e047dc783d1e42d444989230620d/train-scripts/train-esd.py#L244-L245