BenoitDalFerro commented 1 year ago

Windows 11 system, VS 2022

pip install --user pytorch_fast_transformers

C:/Users/me/miniconda3/envs/myenv/lib/site-packages/torch/include\c10/util/Optional.h(446): warning C4624: 'c10::trivially_copyable_optimization_optional_base': destructor was implicitly defined as deleted

          with

          [

              T=std::vector<at::Tensor,std::allocator<at::Tensor>>

          ]

  "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\bin\HostX86\x64\link.exe" /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:C:\Users\me\miniconda3\envs\Myenv\lib\site-packages\torch\lib /LIBPATH:C:\Users\me\miniconda3\envs\Myenv\lib\x64 /LIBPATH:C:\Users\me\miniconda3\envs\Myenv\libs /LIBPATH:C:\Users\me\miniconda3\envs\Myenv /LIBPATH:C:\Users\me\miniconda3\envs\Myenv\PCbuild\amd64 "/LIBPATH:C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\ATLMFC\lib\x64" "/LIBPATH:C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\lib\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\lib\um\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.22621.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\\lib\10.0.22621.0\\um\x64" c10.lib torch.lib torch_cpu.lib torch_python.lib cudart.lib c10_cuda.lib torch_cuda_cu.lib torch_cuda_cpp.lib /EXPORT:PyInit_local_product_cuda C:\Users\me\AppData\Local\Temp\pip-install-kix9wec7\pytorch-fast-transformers_3528fc4942714b5c9f902de6585ab836\build\temp.win-amd64-cpython-39\Release\fast_transformers/local_product/local_product_cuda.obj /OUT:build\lib.win-amd64-cpython-39\fast_transformers\local_product\local_product_cuda.cp39-win_amd64.pyd /IMPLIB:C:\Users\me\AppData\Local\Temp\pip-install-kix9wec7\pytorch-fast-transformers_3528fc4942714b5c9f902de6585ab836\build\temp.win-amd64-cpython-39\Release\fast_transformers/local_product\local_product_cuda.cp39-win_amd64.lib
     Creating library C:\Users\me\AppData\Local\Temp\pip-install-kix9wec7\pytorch-fast-transformers_3528fc4942714b5c9f902de6585ab836\build\temp.win-amd64-cpython-39\Release\fast_transformers/local_product\local_product_cuda.cp39-win_amd64.lib and object C:\Users\me\AppData\Local\Temp\pip-install-kix9wec7\pytorch-fast-transformers_3528fc4942714b5c9f902de6585ab836\build\temp.win-amd64-cpython-39\Release\fast_transformers/local_product\local_product_cuda.cp39-win_amd64.exp
  local_product_cuda.obj : error LNK2001: unresolved external symbol "public: long * __cdecl at::TensorBase::data_ptr<long>(void)const " (??$data_ptr@J@TensorBase@at@@QEBAPEAJXZ)
  build\lib.win-amd64-cpython-39\fast_transformers\local_product\local_product_cuda.cp39-win_amd64.pyd : fatal error LNK1120: 1 unresolved externals
  error: command 'C:\\Program Files\\Microsoft Visual Studio\\2022\\Community\\VC\\Tools\\MSVC\\14.34.31933\\bin\\HostX86\\x64\\link.exe' failed with exit code 1120
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. error: legacy-install-failure

× Encountered error while trying to install package. ╰─> pytorch-fast-transformers

note: This is an issue with the package mentioned above, not pip. hint: See above for output from the failure.

BenoitDalFerro commented 1 year ago

@wl2776 you already fixed it in 8336807, I can build with local copy so my hunch is you have not branch merged to main commit #106 hence the pip release has not been updated downstream

wl2776 commented 1 year ago

Although the code compiles now, I cannot confirm that everything is working.

Two of unit-tests are failing and this failure means that something is wrong either in C++ and CUDA, or somewhere on the boundary between Python and native code.

BenoitDalFerro commented 1 year ago

@wl2776 thanks, any idea when this might be fixed, days or weeks ?

wl2776 commented 1 year ago

Sorry, no ideas.

wl2776 commented 1 year ago

You are welcome to contribute.

wl2776 commented 1 year ago

Actually, only one unit-test is failing, GPU clustering. And I don't know if GPU clustering was working before.

I think, this module can be disabled since CPU clustering also exists.

BenoitDalFerro commented 1 year ago

@wl2776 whilst I work on my end would you mind giving me a little more specifics about your idea, file, line involved ?

wl2776 commented 1 year ago

Failing unit-test is this https://github.com/wl2776/fast-transformers/blob/master/tests/clustering/hamming/test_cluster_gpu.py#L48

When bits == 32, value of centroids is [-1, 0] instead of [0, 2**32 - 1]

It calls this code: https://github.com/wl2776/fast-transformers/blob/master/fast_transformers/clustering/hamming/cluster_cuda.cu

I have added print(centroids) before self.assertEqual() in that test. CUDA code returns [-1, 0]

The issue might be in wrong signed-unsigned type cast

BenoitDalFerro commented 1 year ago

@wl2776 why is this commented out ?

    // initialize the centroids
    //centroids.view({-1, K}) = hashes.view({-1, L}).narrow(1, 0, K);

as per cluster

wl2776 commented 1 year ago

This comment was in original repo

BenoitDalFerro commented 1 year ago

Could you point to the original repo so I can diff it ?

On 7 Mar 2023, at 09:51, wl2776 @.***> wrote:

This comment was in original repo

— Reply to this email directly, view it on GitHubhttps://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fidiap%2Ffast-transformers%2Fissues%2F121%23issuecomment-1457780851&data=05%7C01%7C%7C7eba31ea911e493a34f708db1ee924ec%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638137758912999812%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=BbcKmPnyS22equMffr%2FuEgIdzQy94e%2Fe5ca%2Bsx%2BTmww%3D&reserved=0, or unsubscribehttps://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAQTXJEXMWOO2ISIU4NUEUS3W23ZJDANCNFSM6AAAAAAVRS2AXI&data=05%7C01%7C%7C7eba31ea911e493a34f708db1ee924ec%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638137758912999812%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=MA7ai2mgedl5E32QE%2BBog3wnJPRjtWaN5n64Hmt6lGc%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.***>

wl2776 commented 1 year ago

Could you point to the original repo so I can diff it ?

We are here. This is the original repo: https://github.com/idiap/fast-transformers/

wl2776 commented 1 year ago

I've added printf inside CUDA code.

Here is the diff (sorry, currently I've got some issues with pushing here)

diff --git a/fast_transformers/clustering/hamming/cluster_cuda.cu b/fast_transformers/clustering/hamming/cluster_cuda.cu
index 5549aec..950fde5 100644
--- a/fast_transformers/clustering/hamming/cluster_cuda.cu
+++ b/fast_transformers/clustering/hamming/cluster_cuda.cu
@@ -4,6 +4,8 @@
 // Apoorv Vyas <avyas@idiap.ch>
 //

+#include <stdio.h>
+#include <inttypes.h>
 #include <curand.h>
 #include <curand_kernel.h>
 #include <torch/extension.h>
@@ -405,6 +407,7 @@ void cluster(
         counts,
         n_iterations
     );
+    printf("Centroids %" PRIx64 " %" PRIx64 "\n", centroids[0][0][0].item<int64_t>(), centroids[0][0][1].item<int64_t>());
 }

 PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {

And here is the result

Centroids 3fffffff 0
Centroids 0 7fffffff
Centroids ffffffffffffffff 0
FCentroids ff 0
.Centroids 20 80
.Centroids ff 0
.
======================================================================
FAIL: test_long_clusters (clustering.hamming.test_cluster_gpu.TestClusterGPU)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\work\robot_simulation\fast-transformers\tests\clustering\hamming\test_cluster_gpu.py", line 48, in test_long_clusters
    self.assertEqual(
AssertionError: Tuples differ: (-1, 0) != (0, 4294967295)

First differing element 0:
-1
0

- (-1, 0)
+ (0, 4294967295)

----------------------------------------------------------------------
Ran 4 tests in 1.292s

FAILED (failures=1)

~~So, the result is correct inside CUDA kernel, but Python has received wrong values.~~

EDIT Wrong

CUDA should have 0xFFFFFFFF, but it has 0xffffffffffffffff

BenoitDalFerro commented 1 year ago

@wl2776 "Conversion of a signed type (or any type) to an unsigned type always takes place via reduction modulo one plus the max value of the destination type.", so yeah you're right this is sign casting, let's hunt this down

C99 standard: 6.3.1.3 Signed and unsigned integers

1 When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged. 2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.49) 3 Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised. 49) The rules describe arithmetic on the mathematical value, not the value of a given type of expression.

BenoitDalFerro commented 1 year ago

@wl2776 saw nothing in cuda but bits mapping to B and 10 mapping to n_iterations are passed on to cuda as regular python int https://github.com/wl2776/fast-transformers/blob/master/tests/clustering/hamming/test_cluster_gpu.py#L45 and cast as C++ int here https://github.com/wl2776/fast-transformers/blob/master/fast_transformers/clustering/hamming/cluster_cuda.cu#387 and here https://github.com/wl2776/fast-transformers/blob/master/fast_transformers/clustering/hamming/cluster_cuda.cu#388

BenoitDalFerro commented 1 year ago

formal unsigned occurences as per code

hash_cuda.cu#81

https://github.com/idiap/fast-transformers/blob/2fe048a14c2e67787f553e899123ca4ba9f27e76/fast_transformers/clustering/hamming/cluster_cpu.cpp#23

https://github.com/idiap/fast-transformers/blob/2fe048a14c2e67787f553e899123ca4ba9f27e76/fast_transformers/clustering/hamming/cluster_cpu.cpp#25

wl2776 commented 1 year ago

@BenoitDalFerro

Couldn't the problem be in line 228 of cuda_clustering.cu?

Here https://github.com/wl2776/fast-transformers/blob/833680749f032fdfc29ad6ff3f3a9ffbd33b4c65/fast_transformers/clustering/hamming/cluster_cuda.cu#L228

Currently I am traveling in a metro and typing from mobile. Sorry for bad message formatting. 😅

BenoitDalFerro commented 1 year ago

@wl2776 so mean_k = mean_k | (1LL << i); instead of mean_k = mean_k | (1L << i); what about same two lines above L225 should be (curand(state + k) & 1LL); instead of (curand(state + k) & 1L); ? point is there are other instances of 1L (long - unsigned) that could be converted to 1LL (long long - signed) for safety purposes

BenoitDalFerro commented 1 year ago

@wl2776 I'm not authorized to push fix to repo

wl2776 commented 1 year ago

@BenoitDalFerro, I've fixed this issue in my fork. Will submit PR later. For now, you can clone my fork and build it from sources.

BenoitDalFerro commented 1 year ago

@wl2776 very nice, compiles pleasantly, instantiates ibid

wangsuzhen commented 1 year ago

@BenoitDalFerro, I've fixed this issue in my fork. Will submit PR later. For now, you can clone my fork and build it from sources.

I cannot find you fork, what happned?

wl2776 commented 1 year ago

@BenoitDalFerro, I've fixed this issue in my fork. Will submit PR later. For now, you can clone my fork and build it from sources.

I cannot find you fork, what happened?

I want to register it officially with my management and therefore have to complete some formal paperwork.

wangsuzhen commented 1 year ago

@BenoitDalFerro, I've fixed this issue in my fork. Will submit PR later. For now, you can clone my fork and build it from sources.

I cannot find you fork, what happened?

I want to register it officially with my management and therefore have to complete some formal paperwork.

Could you temporarily open your fork? I would greatly appreciate it. Thanks~

BenoitDalFerro commented 1 year ago

Hi,

Thanks yes can do later on today sorry but busy

Best regards,

Benoit

On 17 Mar 2023, at 08:51, Suzhen Wang @.***> wrote:

@BenoitDalFerrohttps://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FBenoitDalFerro&data=05%7C01%7C%7Cdd75794770504ced649e08db26bc64ee%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638146362805782101%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Rm%2FG6Gkc5m2%2BmmwlN5xFlPKHI0tXjOnT8zjI7mRLbXg%3D&reserved=0, I've fixed this issue in my fork. Will submit PR later. For now, you can clone my fork and build it from sources.

I cannot find you fork, what happened?

I want to register it officially with my management and therefore have to complete some formal paperwork.

Could you temporarily open your fork? I would greatly appreciate it. Thanks~

— Reply to this email directly, view it on GitHubhttps://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fidiap%2Ffast-transformers%2Fissues%2F121%23issuecomment-1473313384&data=05%7C01%7C%7Cdd75794770504ced649e08db26bc64ee%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638146362805782101%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=wQjZJIJF4JTbLAjcL4r62%2FndvCk1%2BWVBNUzCCuYqP5Y%3D&reserved=0, or unsubscribehttps://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAQTXJEWDS67HVABOCEM3573W4QJXNANCNFSM6AAAAAAVRS2AXI&data=05%7C01%7C%7Cdd75794770504ced649e08db26bc64ee%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638146362805782101%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2F%2F7vtmBWnOw%2F8UfXOVQPs%2Bkti264gahZPRM5S7aPQCY%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>

BenoitDalFerro commented 1 year ago

There is another fail with a cuda dot product if I remember correctly, I'll file an issue

On 17 Mar 2023, at 08:48, wl2776 @.***> wrote:

@BenoitDalFerrohttps://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FBenoitDalFerro&data=05%7C01%7C%7Cc34dd21c0ea044c1f5cc08db26bbee29%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638146360810479671%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2BrE1edwH2AgknlZ2jMX0qyD63%2FOxf%2FRh1TzDWjGXQvY%3D&reserved=0, I've fixed this issue in my fork. Will submit PR later. For now, you can clone my fork and build it from sources.

I cannot find you fork, what happened?

I want to register it officially with my management and therefore have to complete some formal paperwork.

— Reply to this email directly, view it on GitHubhttps://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fidiap%2Ffast-transformers%2Fissues%2F121%23issuecomment-1473310037&data=05%7C01%7C%7Cc34dd21c0ea044c1f5cc08db26bbee29%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638146360810479671%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=oD6CuT9mk5UiGUR1Zps18YYXP3MZDOzpzYXpqha2kGA%3D&reserved=0, or unsubscribehttps://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAQTXJESRFB5MWNGXESDF7UTW4QJK7ANCNFSM6AAAAAAVRS2AXI&data=05%7C01%7C%7Cc34dd21c0ea044c1f5cc08db26bbee29%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C638146360810479671%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jKuyW9qqo2R3xH9IhvhyuB1XYzXFzWG4DxKwy5P2gys%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>

wl2776 commented 1 year ago

@BenoitDalFerro, I've fixed this issue in my fork. Will submit PR later. For now, you can clone my fork and build it from sources.

I cannot find you fork, what happened?

I want to register it officially with my management and therefore have to complete some formal paperwork.

Could you temporarily open your fork? I would greatly appreciate it. Thanks~

I've opened it and submitted PR. BTW, github doesn't allow to hide forks of public repos, so actually I've deleted it, then created again

idiap / fast-transformers

Windows installation - building wheel error #121

https://github.com/idiap/fast-transformers/blob/2fe048a14c2e67787f553e899123ca4ba9f27e76/fast_transformers/hashing/hash_cuda.cu#81

https://github.com/idiap/fast-transformers/blob/2fe048a14c2e67787f553e899123ca4ba9f27e76/fast_transformers/clustering/hamming/cluster_cpu.cpp#23

https://github.com/idiap/fast-transformers/blob/2fe048a14c2e67787f553e899123ca4ba9f27e76/fast_transformers/clustering/hamming/cluster_cpu.cpp#25