liuliu / swift-diffusion

BSD 3-Clause "New" or "Revised" License
429 stars 33 forks source link

How to run it properly with Intel CPU #40

Closed ghost closed 1 year ago

ghost commented 1 year ago

What is the best way to run on macOS on intel CPUs, but still want to use Metal There are multiple cases 1) only CPU no GPU 2) very small memory integrated GPU 3) default AMD GPU ( they also have less memory in general ) 4) external GPU

how can we select which GPU to use?

in many cases there is no unified memory. how to work in those cases?

liuliu commented 1 year ago

It can be successfully run on Intel GPU if https://github.com/liuliu/ccv/blob/unstable/lib/nnc/mps/ccv_nnc_mps.m#L169 StorageMode all changed from Shared to Private.

ghost commented 1 year ago

Thanks a lot

what about selecting GPU?, if i have both Intel GPU and AMD GPU

liuliu commented 1 year ago

I don't own these machines so I don't think we solved that. It should be how you associate with MTLDevice, but right now, we associate with whatever default is: https://developer.apple.com/documentation/metal/mtldevice and don't do any enumeration. We can do that and then it will just be like how we work with CUDA GPUs (the index for GPU). BTW, if you have any other questions, feel free to reach out to me at i@liuliu.me

ghost commented 1 year ago

Okay thanks a lot

ghost commented 1 year ago

https://github.com/liuliu/ccv/commit/8c745f2a4b19a750a22cab093d9a6094456a4260

What to you mean by partially working. what all is not working?

ghost commented 1 year ago

I get this error if i compile it on Intel and run it. -[MTLHeapDescriptorInternal validateWithDevice:]:335: failed assertionHeap Descriptor Validation Placement heap type is not supported.`

liuliu commented 1 year ago

Try to update to the latest s4nnc. The issue is that CPU & GPU Shared buffer type not support for MTLHeap on Intel. Latest s4nnc / ccv combo changed that to Private for x86 chips (that's how we support Intel experimentally in DT).

liuliu commented 1 year ago

liuliu/ccv@8c745f2

What to you mean by partially working. what all is not working?

Float16 is not working. Also, it still doesn't work within simulator for unknown reasons.

ghost commented 1 year ago

The only change is in this commit right: https://github.com/liuliu/ccv/commit/8c745f2a4b19a750a22cab093d9a6094456a4260 ?

i had already applied this

liuliu commented 1 year ago

Yeah, basically this line: https://github.com/liuliu/ccv/commit/8c745f2a4b19a750a22cab093d9a6094456a4260#diff-f95a613d81a5aa4e357a86d8b20fcf4e9b12ef67b6ce586195e3e58471f10812R129

liuliu commented 1 year ago

Maybe check what's the version of the OS? I only tested it with Ventura.

ghost commented 1 year ago

Yeah, i am getting the -[MTLHeapDescriptorInternal validateWithDevice:]:335: failed assertion i am on Monterey. and i wanted to test it from MacOS 11.0 to 13.2

liuliu commented 1 year ago

Thanks. But Ventura worked? Yeah, it might be related to how we use MTLHeap type placement. Unfortunately, I don't have an easy solution as our internal memory allocation algo relies on this property (we can place MTLBuffer at our own given offset on MTLHeap).

ghost commented 1 year ago

Have not tested it with Ventura. I also tried running DrawThings with Intel Monterey, and it crashes with MTLHeap error in sdterror.

Maybe you could try creating a VM and testing it with macOS 12.5.

ghost commented 1 year ago

do you think just this line : https://github.com/liuliu/ccv/commit/8c745f2a4b19a750a22cab093d9a6094456a4260#diff-f95a613d81a5aa4e357a86d8b20fcf4e9b12ef67b6ce586195e3e58471f10812R129

and no changes in pagesize etc, could work?

liuliu commented 1 year ago

These changes shouldn't be relevant though. The page size mainly to fix in simulator, PAGE_SIZE macro is from iPhoneOS header, so it is 16K while the actual page size is from the OS, which is 4K (and should get the correct one from the global variable).

Let me know if macOS 13.x worked for you, and we can look into what exactly is not supported on macOS 12.x if this is confirmed.

ghost commented 1 year ago

I dont have 13.x right now. i will maybe try it on aws or something in the future.

I tried some dumb things like replacing "ModeShared" to "ModePrivate" in all code. that gives this error: -[MTLIOAccelBuffer initWithDevice:pointer:length:options:sysMemSize:vidMemSize:args:argsSize:deallocator:]:105: failed assertion `storageModePrivate incompatible with ...WithBytes variant of newBuffer'

why do some parts still use ModeShared? : https://github.com/liuliu/ccv/blob/8c745f2a4b19a750a22cab093d9a6094456a4260/lib/nnc/mps/ccv_nnc_mps.m#L161

ghost commented 1 year ago

what are the potential things causing this not working in 12.x?

ghost commented 1 year ago

I used MTLHeapTypeAutomatic and that error goes away. Although it seems to get stuck in some kind of loop or something. Any idea why thats the case?

liuliu commented 1 year ago

My understanding is MTLHeapTypePlacement with Intel Mac is not supported in 12.x. I actually remember testing that now.

MTLHeapTypeAutomatic probably won't be what you want since it might have errors (or silent issues) because we cannot specify offset for heap allocation any more.

ghost commented 1 year ago

Okay, then what is the solution? How to run it on Intel 12.x?

If it is not possible, probably you should mention drawthings.ai site, that it does not work witn 12.x

liuliu commented 1 year ago

I think it still works with Apple Silicon 12.x. But I need to double-check. The problem is I don't have access to these machines any more (all my machines upgraded to 13.x).

ghost commented 1 year ago

Yes it works with Apple Silicon, but not with Intel 12.x . Is there any way it would work with 12.x on Intel?

liuliu commented 1 year ago

I think it is a driver level thing. It may be possible to use Automatic, but need to think through how exactly (we rely on allocating a heap and then all reuses are through Placement offset at the same offset).

ghost commented 1 year ago

I was testing it on Ventura with Intel. That also has some errors. To fix those i replaced all MTLResourceStorageModeShared with MTLResourceStorageModePrivate Now it works fine if I dont load weights to the model. But if i load weights to the model, then i get this error:

-[MTLIOAccelBuffer initWithDevice:pointer:length:options:sysMemSize:vidMemSize:gpuAddress:args:argsSize:deallocator:]:119: failed
 assertion `storageModePrivate incompatible with ...WithBytes variant of newBuffer'
liuliu commented 1 year ago

Yeah, my understanding is that to load weights, we have to use Shared. Are you using a Intel Mac with discrete card? The a few places we do Shared is basically for copying data over.

ghost commented 1 year ago

I am running it on the AWS dedicated host. Its expensive AF, so i am trying to resolve the issues ( atleast on Ventura ) ASAP.

If i dont do shared everywhere, I get this error.

-[MTLIOAccelHeap newSubResourceAtOffset:withLength:alignment:options:]:256: failed assertion `The requested storage mode (MTLStorageModeShared) is not compatible with the heap's mode (MTLStorageModePrivate)'

liuliu commented 1 year ago

That error is weird. It suggests we are allocating MTLBuffer from MTLHeap, and that allocated MTLBuffer is Shared. But it shouldn't be the case because we only set MTLBuffer allocated directly from device as Shared (when supplying pointer). Do you mind to share a bit more on the code etc?

ghost commented 1 year ago

Yeah, it works now, i had to replace all ModeShared with ModePrivate, except mem copy. now it works.

liuliu commented 1 year ago

If you can share your diff, that would be great! I thought all the ModeShared (except for copying) are gated to be ModePrivate when it is x86, am I missing anything?

ghost commented 1 year ago
@@ -13,6 +13,14 @@
 #import <sys/utsname.h>
 #import <sys/mman.h>

+
+#ifdef __x86_64__
+   #define MTL_RESOURCE_STORAGE_MODE MTLResourceStorageModePrivate
+#else
+   #define MTL_RESOURCE_STORAGE_MODE MTLResourceStorageModeShared
+#endif
+
+
 id<MTLDevice> ccv_nnc_default_device(void)
 {
    static dispatch_once_t once;
@@ -149,11 +157,11 @@ void mpheapfree(int device, void* ptr)

 void* mpobjmalloc(int device, size_t size)
 {
-   id<MTLBuffer> buffer = [ccv_nnc_default_device() newBufferWithLength:size options:MTLResourceStorageModeShared];
+   id<MTLBuffer> buffer = [ccv_nnc_default_device() newBufferWithLength:size options:MTL_RESOURCE_STORAGE_MODE];
    if (buffer == nil)
    {
        mptrigmp();
-       buffer = [ccv_nnc_default_device() newBufferWithLength:size options:MTLResourceStorageModeShared];
+       buffer = [ccv_nnc_default_device() newBufferWithLength:size options:MTL_RESOURCE_STORAGE_MODE];
        assert(buffer != nil);
    }
    return (void*)buffer;
@@ -168,13 +176,13 @@ void mpobjfree(int device, void* ptr)
 void* mpobjcreate(void* ptr, off_t offset, size_t size)
 {
    id<MTLHeap> heap = (id<MTLHeap>)ptr;
-   MTLSizeAndAlign sizeAndAlign = [ccv_nnc_default_device() heapBufferSizeAndAlignWithLength:size options:MTLResourceCPUCacheModeDefaultCache | MTLResourceStorageModeShared];
+   MTLSizeAndAlign sizeAndAlign = [ccv_nnc_default_device() heapBufferSizeAndAlignWithLength:size options:MTLResourceCPUCacheModeDefaultCache | MTL_RESOURCE_STORAGE_MODE];
    assert(offset % sizeAndAlign.align == 0);
-   id<MTLBuffer> buffer = [heap newBufferWithLength:sizeAndAlign.size options:MTLResourceCPUCacheModeDefaultCache | MTLResourceStorageModeShared offset:offset];
+   id<MTLBuffer> buffer = [heap newBufferWithLength:sizeAndAlign.size options:MTLResourceCPUCacheModeDefaultCache | MTL_RESOURCE_STORAGE_MODE offset:offset];
    if (buffer == nil)
    {
        mptrigmp();
-       buffer = [heap newBufferWithLength:sizeAndAlign.size options:MTLResourceCPUCacheModeDefaultCache | MTLResourceStorageModeShared offset:offset];
+       buffer = [heap newBufferWithLength:sizeAndAlign.size options:MTLResourceCPUCacheModeDefaultCache | MTL_RESOURCE_STORAGE_MODE offset:offset];
        assert(buffer != nil);
    }
    [buffer makeAliasable];
@@ -203,10 +211,10 @@ @implementation MTLFileBackedBuffer
        madvise(bufptr, size, MADV_SEQUENTIAL | MADV_WILLNEED);
        if (ccv_nnc_flags() & CCV_NNC_DISABLE_MMAP_MTL_BUFFER)
        {
-           obj = [[ccv_nnc_default_device() newBufferWithBytes:bufptr length:size options:MTLResourceCPUCacheModeDefaultCache | MTLResourceStorageModeShared] autorelease];
+           obj = [[ccv_nnc_default_device() newBufferWithBytes:bufptr length:size options:MTLResourceCPUCacheModeDefaultCache | MTL_RESOURCE_STORAGE_MODE] autorelease];
            munmap(bufptr, size);
        } else
-           obj = [[ccv_nnc_default_device() newBufferWithBytesNoCopy:bufptr length:size options:MTLResourceCPUCacheModeDefaultCache | MTLResourceStorageModeShared deallocator:^(void *ptr, NSUInteger len) {
+           obj = [[ccv_nnc_default_device() newBufferWithBytesNoCopy:bufptr length:size options:MTLResourceCPUCacheModeDefaultCache | MTL_RESOURCE_STORAGE_MODE deallocator:^(void *ptr, NSUInteger len) {
                munmap(ptr, len);
            }] autorelease];
    }
liuliu commented 1 year ago

You probably missed some commits :) https://github.com/liuliu/ccv/blob/unstable/lib/nnc/mps/ccv_nnc_mps.m#L153

ghost commented 1 year ago

Yeah