Closed jf003320018 closed 6 years ago
we are looking for developers who are willing to support and build out windows support and join our core team of open source developers. None of our core team uses windows right now, so it is hard to add support for it ourselves.
@soumith I am willing to help with windows support
@souravsingh that's great. once you have a pytorch compiling and running on windows, and the equivalent of ./run_test.sh passes unit tests we can take it forward.
From what I heard, it's best to use python 3.5 + anaconda for windows, because we use C++11 in our core parts.
I'm interested in collaborating on porting this to windows. I use conda with python 2.7 and 3.5 for different things, so having the variable version support would be nice :) I don't know much about what would need to be done to accomplish this, but I can poke around.
We've heard that python 2.7 libs have to be compiled with old msvc, that doesn't support C++11, and that might be a blocker for it 😕
Since people keep asking about it, I am working on compiling it with msvc. It's taking longer than I expected, since sizeof(long)==4 on Win64.
Ah, this is going to be awful. We need to make TH used fixed-size longs, or torch.LongTensor
will easily overflow on Windows.
It is awful.
typedef long long THLong;
typedef long THLong;
And then replace every single reference to long with THLong. I'm having to modify a huge number of files.
Wait, are you changing libTH code?
@tylergenter normally TH, THNN, THC and THCUNN should compile on MSVC, thanks to the several fixes sent from @BTNC in the torch7 package.
They compile, I just had to create build_all.bat
The problem (as I stated above) is that sizeof(long)==4.
For example,
int main () { auto sdf = (THLongStorage*) NULL; printf ("%d\n", sizeof(sdf->data[0])); } Compiled with Visual Studio 15, targeting x64, this prints out 4.
If you're willing to go over all libs anyway, then it would be better to convert them to use stdint types (e.g. int64_t
for long), so that you don't need to use any macros, and it would be truly cross-platform. We wanted to make that change a while ago, but it turned out to be a lot of work, and we decided to postpone it after the release.
I'll convert all the references to long to int64_t instead. Should I base my patches against torch/torch7 and torch/nn?
Yes, that would be the easiest for us. Thank you!
My two cents: I prefer the macro way with int64_t (ie. define int64_t THLong and replace long with THLong) because (1) it is consistent with THHalf; (2) it is convenient to define all TH types in a centralized place, so that it is painless when one whats to change that type again in the future; (3) it makes codes clearer that which part comply with torch type restrictions.
By the way, I think the sizeof(long)==4 issue on windows is not a blocker to port pytorch to windows. One can still play with tensors other than LongTensor especially HalfTensor, FloatTensor, etc. I suggest to postpone converting long to int64_t after have ported pytorch to windows.
I was originally going to postpone/not do it at all, until I got to the section of Pytorch that serialized data. There's one section that writes the tensor dimensions to a file using longs. I was worried about compatibility problems/more complicated code to deal with the different sizes.
I am willing to take up this task.
Just to give you an update, so you don't think I've given up. Everything compiles. All the test cases pass, except I haven't tried CUDA, since I need to get a power supply for my desktop. I also haven't tried to compile with WITH_DISTRIBUTED. I also need to clean up my build scripts.
that was honestly quite quick :)
Awesome! No need to focus on distributed yet, it's experimental anyway.
@tylergenter Just curious where this is at and whether the CUDA support has been tested? I'm starting a course this week that uses pytorch and I'm on a windows environment.
@EvenOldridge It is passing most of the CUDA tests. I think I know how to fix the ones it is failing. To be honest, unless if you have the time and patience to track down weird compiler errors, I wouldn't rely on it for your class. It's very alpha quality.
@tylergenter The alternative for me is setting up a whole new environment since it's required for the class. Do you have your work checked in on a branch or is there somewhere else I can access it to try it out?
I'd be happy to guinea pig it for you and help you where I can. I'm not super experienced at this kind of port/debugging but I'm interested to learn because it seems as if a lot of libraries are released on linux first.
Is there any news regarding the windows port? I'd really like to get my hands on pytorch in windows...
@Neltherion as a side node, for CPU only usage, you should know that pytorch works out of the box on the Windows Subsystem for Linux.
Thanks! but I'm really hoping for a CUDA GPU support on Windows... I understand I shouldn't be greedy as even Tensorflow started supporting Windows months after its initial release...
@tylergenter Just checking in again. Do you have your work checked in on a branch or is there somewhere else I can access it to try it out?
We're approaching the section of the course where I need pytorch and I'd rather not have to setup a whole new environment if I can avoid it.
@tylergenter @soumith Have a look at the work of http://www.lfd.uci.edu/~gohlke/pythonlibs/ who has been building Windows executables for years. Includes PyCuda et al.
Are there any people from Microsoft reading this thread? I think you should help porting PyTorch to Windows. Otherwise, many people are already two clicks away from switching to Linux.
@tylergenter Would you mind to share the work you have done with a PR or point us to a github repo? Some other people might want to build on what you have done.
I really wish I had uploaded it earlier. My less than a year old ssd stopped working, and I lost everything on it. Sorry
@tylergenter this is so sad to hear! A reminder for all of us to hit the backup button this evening.. I can imagine what you must be feeling, and sure enough you are not ready to start all over again - not right now at least. On the other hand there are dozens of people willing to help, so if by any chance you find time to put together some instructions for the community of what needs to be done, I am sure people will gather around and offer you as much help as needed in reproducing what you've achieved. Would you be able to help coordinate this process?
Hi, a minor issue but THPP fails to build with MSVC, because it does not support variable length arrays. THTensor::catArray and THCTensor::catArray use a variable length array to hold a temporary array of pointers.
At THPP/tensors/generic/THTensor.cpp:
template<>
auto THCTensor<real>::catArray(const std::vector<Tensor*>& inputs_vec,
int dimension) -> THCTensor& {
int numInputs = inputs_vec.size();
tensor_type *inputs[numInputs];
...
I confirmed that TH, THS, THC, THNN, THCUNN, and THCS compile with MSVC 2015 (with lots of warning). But I couldn't build libshm because it heavily relies on POSIX interface. Do you know how to get around this problem?
@masahi can't you just replace the variable length array with a std::vector of pointers ?
@bordingj sure, that solved the build issue. I just wanted to let pytorch devs know this strange MSVC issue.
@masahi great! - how about the issue with long being 32bit on windows ?
@masahi libshm could be patched with some no-op stubs for Windows. It's not strictly necessary to run pytorch, but multiprocessing will be broken without it.
Maybe have a look at http://stackoverflow.com/a/4642169
@bordingj I can do long to int64_t conversion if I want, but the problem is I can't run the test suite without first installing pytorch itself. Installing pytorch requires libshm.
@apaszke ok, I will see what I can do. You said you postponed long to int64_t conversion after the release, have you got around to it?
@masahi no, I haven't. @colesbury tried it, but it appeared to be a larger change so we decided to put it off.
While my laptop wouldn't even recognize my ssd, Linux on my desktop was able to access it (albeit ridiculously slowly). I'm updating my branch to pytorch/master right now. I'll try to get something out in the next week.
In regards to libshm, I just replaced it with no-op stubs, since it's not needed on Windows. Windows supports anonymous memory mappings that will automatically be deleted when all references to it are closed. (https://msdn.microsoft.com/en-us/library/windows/desktop/aa366551(v=vs.85).aspx)
@tylergenter where is your ongoing work? Can I try it now? If possible, I can help porting as well.
does cygwin support cuda on windows?
@retsyo Probably, if you can figure out the right combination of compiler flags.
Don't just indiscriminately replace long
with int64_t
. Some of them should always be 64 bits, some of them should be 32 or 64 bit depending on the platform.
A rough guideline for when to use which type:
uint64_t
/ int64_t
- Use these when you need 64 bits, regardless of architecture. e.g. when you want 64-bit tensor elements.
(You could also consider (u)int_least64_t
and (u)int_fast64_t
for improved portability to exotic platforms, but I don't think it's worth the bother at this point.)
uintptr_t
/ intptr_t
- Use these when storing a pointer as integer.size_t
/ ptrdiff_t
- Use these for array sizes and indexes.
If you're lazy, you could use (u)intptr_t
instead, since these should be the same size as pointers on most flat-memory-space architectures.
I've built the code sucessfully on Windows 10 x64 with Visual Studio 2015. After some modifications, the example code of MNIST can run with CUDA support without problem. The major problem now is the cuDNN backend cannot be used. It will raise the error below:
Traceback (most recent call last):
File "test_mnist.py", line 129, in <module>
train(epoch)
File "test_mnist.py", line 95, in train
output = model(data)
File "C:\Anaconda2\envs\py3\lib\site-packages\torch\nn\modules\module.py", line 206, in __call__
result = self.forward(*input, **kwargs)
File "test_mnist.py", line 49, in forward
x = F.relu(F.max_pool2d(self.conv1(x), 2))
File "C:\Anaconda2\envs\py3\lib\site-packages\torch\nn\modules\module.py", line 206, in __call__
result = self.forward(*input, **kwargs)
File "C:\Anaconda2\envs\py3\lib\site-packages\torch\nn\modules\conv.py", line 237, in forward
self.padding, self.dilation, self.groups)
File "C:\Anaconda2\envs\py3\lib\site-packages\torch\nn\functional.py", line 40, in conv2d
return f(input, weight, bias)
RuntimeError: CUDNN_STATUS_BAD_PARAM
The cudnn library can load without problem and it can get the version of cudnn correctly.
In [5]: torch.backends.cudnn.lib.cudnnGetErrorString()
Out[5]: b'CUDNN_UNKNOWN_STATUS'
In [6]: torch.backends.cudnn.lib.cudnnGetVersion()
Out[6]: 6021
How to fix this problem?
Couple of questions. What version of Python are you using? Where are you getting test_mnist.py from?
@tylergenter I used Python 3.6 and the test_mnist.py is just a modified version of the one in the example repo. The original version is listed here. The multiprocessing part is broken, causing the redefinition of the data loaders and then the process hangs. So i wrapped them with an if statement.
if __name__ == '__main__':
train_loader = torch.utils.data.DataLoader(
datasets.MNIST('../data', train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=args.batch_size, shuffle=True, **kwargs)
test_loader = torch.utils.data.DataLoader(
datasets.MNIST('../data', train=False, transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=args.batch_size, shuffle=True, **kwargs)
I think pytorch should add Windows support. Other deep learning frameworks, like tensorflow, theano and mxnet, all support Windows. I only use Windows in my work. So I want to know whether pytorch will support Windows in future.