derive_face_edges_from_faces high ram usage

MarcusLoppe commented 10 months ago

So I'm trying to see what prevents from training on high poly count meshes. I tried with 5k & 16k face count meshes, below are the results.

Using a batch size 1 at 5k the memory usage went up 1.5 GB, when I switched to batch size, it went up to 4 GB and if loaded it using the GPU i increased by 6 GB.

The face_edges object that is return has a actual usage of 1386.40 MB so 3.4GB is junk (at 4 batch size of 5k), I tried calling gc.collect() but no change.

I've tried to optimize the derive_face_edges_from_faces function but haven't had much luck, current it convert a batch of 1 in 0.43sec so if there is head room of making it slower and more memory effective.

Making it slower might affect the transformer since it needs to call it each step. Current this looks like a big memory issue and I hope someone better can resolve it. I'll try to see if there is another bottlenecks.

Metric	5k Faces - 1 Batch Size	5k Faces - 4 Batch Size
Initial RAM Usage (MB)	653.93	1803.65
all_edges Usage (MB)	346.60	346.60
face_masks Usage (MB)	0.00	0.02
face_edges_masks Usage (MB)	21.66	86.65
shared_vertices Usage (MB)	194.96	779.85
Before loop (MB)	1221.18	2929.93
face_edges after loop (MB)	346.60	1386.40
face_edges Usage (MB)	346.60	1386.40
After loop (MB)	2158.66	5825.12
torch.Size	[1, 22714756, 2]	[4, 22714756, 2]

15k face - 1 batch size:	Metric	Value (MB)
Initial RAM Usage	689.57
all_edges Usage	4234.15
face_masks Usage	0.02
face_edges_masks Usage	264.63
shared_vertices Usage	2381.71
Before loop	7546.62
face_edges after loop	0.94
face_edges Usage	0.94
After loop	10195.26

    all_edges = torch.stack(torch.meshgrid(
        torch.arange(max_num_faces, device = device),
        torch.arange(max_num_faces, device = device),
    indexing = 'ij'), dim = -1)

    face_masks = reduce(faces != pad_id, 'b nf c -> b nf', 'all')

    face_edges_masks = rearrange(face_masks, 'b i -> b i 1') & rearrange(face_masks, 'b j -> b 1 j')
    shared_vertices = rearrange(faces, 'b i c -> b i 1 c 1') == rearrange(faces, 'b j c -> b 1 j 1 c')  

    print(f"all_edges Usage: {all_edges.element_size() * all_edges.numel() / (1024 ** 2):.2f} MB")
    print(f"face_masks Usage: {face_masks.element_size() * face_masks.numel() / (1024 ** 2):.2f} MB")
    print(f"face_edges_masks Usage: {face_edges_masks.element_size() * face_edges_masks.numel() / (1024 ** 2):.2f} MB")
    print(f"shared_vertices Usage: {shared_vertices.element_size() * shared_vertices.numel() / (1024 ** 2):.2f} MB")

    print(f"Before loop: {get_ram_usage():.2f} MB") 

    for face, face_edge_mask in zip(faces, face_edges_masks):
                   ...............

    print(f"face_edges after loop: {sum(tensor.element_size() * tensor.numel() for tensor in face_edges) / (1024 ** 2):.2f} MB")

    face_edges = pad_sequence(face_edges, padding_value = pad_id, batch_first = True)
    print(f"face_edges Usage: {face_edges.element_size() * face_edges.numel() / (1024 ** 2):.2f} MB")

    print(f"After loop:: {get_ram_usage():.2f} MB")
    if is_one_face:
        face_edges = rearrange(face_edges, '1 e ij -> e ij')

    return face_edges

lucidrains commented 10 months ago

so I actually preempted this and the repository allows for face edges to be precomputed and passed in. you just have to change the data_kwargs to include "face_edges" on both trainers with appropriate custom Dataset

lucidrains commented 10 months ago

signing off for the holidays. merry xmas and see you in 2024!

MarcusLoppe commented 10 months ago

so I actually preempted this and the repository allows for face edges to be precomputed and passed in. you just have to change the data_kwargs to include "face_edges"

Yes I was thinking about it, but if I converted lets say 200 models @ 500 MB each to face edges = 100GB
But the transformer can't cache or use the this since it generates the mesh on the fly.

Happy holidays 🎉

lucidrains commented 10 months ago

@MarcusLoppe ok final commit, now you can precompute the mesh codes 😄 ok, signing off for real

lucidrains commented 9 months ago

@MarcusLoppe added a way to cache the derivation of the face edges through a simple decorator on the dataset class

should resolve this issue

lucidrains / meshgpt-pytorch

derive_face_edges_from_faces high ram usage #34