pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
https://pytorch.org
Other
78.46k stars 21.17k forks source link

Add windows support please #494

Closed jf003320018 closed 6 years ago

jf003320018 commented 7 years ago

I think pytorch should add Windows support. Other deep learning frameworks, like tensorflow, theano and mxnet, all support Windows. I only use Windows in my work. So I want to know whether pytorch will support Windows in future.

soumith commented 7 years ago

we are looking for developers who are willing to support and build out windows support and join our core team of open source developers. None of our core team uses windows right now, so it is hard to add support for it ourselves.

souravsingh commented 7 years ago

@soumith I am willing to help with windows support

soumith commented 7 years ago

@souravsingh that's great. once you have a pytorch compiling and running on windows, and the equivalent of ./run_test.sh passes unit tests we can take it forward.

From what I heard, it's best to use python 3.5 + anaconda for windows, because we use C++11 in our core parts.

theamazingfedex commented 7 years ago

I'm interested in collaborating on porting this to windows. I use conda with python 2.7 and 3.5 for different things, so having the variable version support would be nice :) I don't know much about what would need to be done to accomplish this, but I can poke around.

apaszke commented 7 years ago

We've heard that python 2.7 libs have to be compiled with old msvc, that doesn't support C++11, and that might be a blocker for it 😕

tylergenter commented 7 years ago

Since people keep asking about it, I am working on compiling it with msvc. It's taking longer than I expected, since sizeof(long)==4 on Win64.

apaszke commented 7 years ago

Ah, this is going to be awful. We need to make TH used fixed-size longs, or torch.LongTensor will easily overflow on Windows.

tylergenter commented 7 years ago

It is awful.

ifdef _MSC_VER

typedef long long THLong;

else

typedef long THLong;

endif

And then replace every single reference to long with THLong. I'm having to modify a huge number of files.

apaszke commented 7 years ago

Wait, are you changing libTH code?

fmassa commented 7 years ago

@tylergenter normally TH, THNN, THC and THCUNN should compile on MSVC, thanks to the several fixes sent from @BTNC in the torch7 package.

tylergenter commented 7 years ago

They compile, I just had to create build_all.bat

The problem (as I stated above) is that sizeof(long)==4.

For example,

include

include "TH/THStorage.h"

int main () { auto sdf = (THLongStorage*) NULL; printf ("%d\n", sizeof(sdf->data[0])); } Compiled with Visual Studio 15, targeting x64, this prints out 4.

apaszke commented 7 years ago

If you're willing to go over all libs anyway, then it would be better to convert them to use stdint types (e.g. int64_t for long), so that you don't need to use any macros, and it would be truly cross-platform. We wanted to make that change a while ago, but it turned out to be a lot of work, and we decided to postpone it after the release.

tylergenter commented 7 years ago

I'll convert all the references to long to int64_t instead. Should I base my patches against torch/torch7 and torch/nn?

apaszke commented 7 years ago

Yes, that would be the easiest for us. Thank you!

BTNC commented 7 years ago

My two cents: I prefer the macro way with int64_t (ie. define int64_t THLong and replace long with THLong) because (1) it is consistent with THHalf; (2) it is convenient to define all TH types in a centralized place, so that it is painless when one whats to change that type again in the future; (3) it makes codes clearer that which part comply with torch type restrictions.

By the way, I think the sizeof(long)==4 issue on windows is not a blocker to port pytorch to windows. One can still play with tensors other than LongTensor especially HalfTensor, FloatTensor, etc. I suggest to postpone converting long to int64_t after have ported pytorch to windows.

tylergenter commented 7 years ago

I was originally going to postpone/not do it at all, until I got to the section of Pytorch that serialized data. There's one section that writes the tensor dimensions to a file using longs. I was worried about compatibility problems/more complicated code to deal with the different sizes.

VariableVasasMT commented 7 years ago

I am willing to take up this task.

tylergenter commented 7 years ago

Just to give you an update, so you don't think I've given up. Everything compiles. All the test cases pass, except I haven't tried CUDA, since I need to get a power supply for my desktop. I also haven't tried to compile with WITH_DISTRIBUTED. I also need to clean up my build scripts.

soumith commented 7 years ago

that was honestly quite quick :)

apaszke commented 7 years ago

Awesome! No need to focus on distributed yet, it's experimental anyway.

EvenOldridge commented 7 years ago

@tylergenter Just curious where this is at and whether the CUDA support has been tested? I'm starting a course this week that uses pytorch and I'm on a windows environment.

tylergenter commented 7 years ago

@EvenOldridge It is passing most of the CUDA tests. I think I know how to fix the ones it is failing. To be honest, unless if you have the time and patience to track down weird compiler errors, I wouldn't rely on it for your class. It's very alpha quality.

EvenOldridge commented 7 years ago

@tylergenter The alternative for me is setting up a whole new environment since it's required for the class. Do you have your work checked in on a branch or is there somewhere else I can access it to try it out?

I'd be happy to guinea pig it for you and help you where I can. I'm not super experienced at this kind of port/debugging but I'm interested to learn because it seems as if a lot of libraries are released on linux first.

Neltherion commented 7 years ago

Is there any news regarding the windows port? I'd really like to get my hands on pytorch in windows...

albanD commented 7 years ago

@Neltherion as a side node, for CPU only usage, you should know that pytorch works out of the box on the Windows Subsystem for Linux.

Neltherion commented 7 years ago

Thanks! but I'm really hoping for a CUDA GPU support on Windows... I understand I shouldn't be greedy as even Tensorflow started supporting Windows months after its initial release...

EvenOldridge commented 7 years ago

@tylergenter Just checking in again. Do you have your work checked in on a branch or is there somewhere else I can access it to try it out?

We're approaching the section of the course where I need pytorch and I'd rather not have to setup a whole new environment if I can avoid it.

dineshbvadhia commented 7 years ago

@tylergenter @soumith Have a look at the work of http://www.lfd.uci.edu/~gohlke/pythonlibs/ who has been building Windows executables for years. Includes PyCuda et al.

andy-gh commented 7 years ago

Are there any people from Microsoft reading this thread? I think you should help porting PyTorch to Windows. Otherwise, many people are already two clicks away from switching to Linux.

willyd commented 7 years ago

@tylergenter Would you mind to share the work you have done with a PR or point us to a github repo? Some other people might want to build on what you have done.

tylergenter commented 7 years ago

I really wish I had uploaded it earlier. My less than a year old ssd stopped working, and I lost everything on it. Sorry

andy-gh commented 7 years ago

@tylergenter this is so sad to hear! A reminder for all of us to hit the backup button this evening.. I can imagine what you must be feeling, and sure enough you are not ready to start all over again - not right now at least. On the other hand there are dozens of people willing to help, so if by any chance you find time to put together some instructions for the community of what needs to be done, I am sure people will gather around and offer you as much help as needed in reproducing what you've achieved. Would you be able to help coordinate this process?

masahi commented 7 years ago

Hi, a minor issue but THPP fails to build with MSVC, because it does not support variable length arrays. THTensor::catArray and THCTensor::catArray use a variable length array to hold a temporary array of pointers.

At THPP/tensors/generic/THTensor.cpp:

template<>
auto THCTensor<real>::catArray(const std::vector<Tensor*>& inputs_vec,
                              int dimension) -> THCTensor& {
  int numInputs = inputs_vec.size();
  tensor_type *inputs[numInputs];
  ...
masahi commented 7 years ago

I confirmed that TH, THS, THC, THNN, THCUNN, and THCS compile with MSVC 2015 (with lots of warning). But I couldn't build libshm because it heavily relies on POSIX interface. Do you know how to get around this problem?

bordingj commented 7 years ago

@masahi can't you just replace the variable length array with a std::vector of pointers ?

masahi commented 7 years ago

@bordingj sure, that solved the build issue. I just wanted to let pytorch devs know this strange MSVC issue.

bordingj commented 7 years ago

@masahi great! - how about the issue with long being 32bit on windows ?

apaszke commented 7 years ago

@masahi libshm could be patched with some no-op stubs for Windows. It's not strictly necessary to run pytorch, but multiprocessing will be broken without it.

bordingj commented 7 years ago

Maybe have a look at http://stackoverflow.com/a/4642169

masahi commented 7 years ago

@bordingj I can do long to int64_t conversion if I want, but the problem is I can't run the test suite without first installing pytorch itself. Installing pytorch requires libshm.

@apaszke ok, I will see what I can do. You said you postponed long to int64_t conversion after the release, have you got around to it?

apaszke commented 7 years ago

@masahi no, I haven't. @colesbury tried it, but it appeared to be a larger change so we decided to put it off.

tylergenter commented 7 years ago

While my laptop wouldn't even recognize my ssd, Linux on my desktop was able to access it (albeit ridiculously slowly). I'm updating my branch to pytorch/master right now. I'll try to get something out in the next week.

In regards to libshm, I just replaced it with no-op stubs, since it's not needed on Windows. Windows supports anonymous memory mappings that will automatically be deleted when all references to it are closed. (https://msdn.microsoft.com/en-us/library/windows/desktop/aa366551(v=vs.85).aspx)

howard0su commented 7 years ago

@tylergenter where is your ongoing work? Can I try it now? If possible, I can help porting as well.

tylergenter commented 7 years ago

See https://github.com/tylergenter/pytorch

In particular, https://github.com/tylergenter/pytorch/blob/master/README_Windows.md

retsyo commented 7 years ago

does cygwin support cuda on windows?

tylergenter commented 7 years ago

@retsyo Probably, if you can figure out the right combination of compiler flags.

CodesInChaos commented 7 years ago

Don't just indiscriminately replace long with int64_t. Some of them should always be 64 bits, some of them should be 32 or 64 bit depending on the platform.

A rough guideline for when to use which type:

peterjc123 commented 7 years ago

I've built the code sucessfully on Windows 10 x64 with Visual Studio 2015. After some modifications, the example code of MNIST can run with CUDA support without problem. The major problem now is the cuDNN backend cannot be used. It will raise the error below:

Traceback (most recent call last):
  File "test_mnist.py", line 129, in <module>
    train(epoch)
  File "test_mnist.py", line 95, in train
    output = model(data)
  File "C:\Anaconda2\envs\py3\lib\site-packages\torch\nn\modules\module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "test_mnist.py", line 49, in forward
    x = F.relu(F.max_pool2d(self.conv1(x), 2))
  File "C:\Anaconda2\envs\py3\lib\site-packages\torch\nn\modules\module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Anaconda2\envs\py3\lib\site-packages\torch\nn\modules\conv.py", line 237, in forward
    self.padding, self.dilation, self.groups)
  File "C:\Anaconda2\envs\py3\lib\site-packages\torch\nn\functional.py", line 40, in conv2d
    return f(input, weight, bias)
RuntimeError: CUDNN_STATUS_BAD_PARAM

The cudnn library can load without problem and it can get the version of cudnn correctly.

In [5]: torch.backends.cudnn.lib.cudnnGetErrorString()
Out[5]: b'CUDNN_UNKNOWN_STATUS'

In [6]: torch.backends.cudnn.lib.cudnnGetVersion()
Out[6]: 6021

How to fix this problem?

tylergenter commented 7 years ago

Couple of questions. What version of Python are you using? Where are you getting test_mnist.py from?

peterjc123 commented 7 years ago

@tylergenter I used Python 3.6 and the test_mnist.py is just a modified version of the one in the example repo. The original version is listed here. The multiprocessing part is broken, causing the redefinition of the data loaders and then the process hangs. So i wrapped them with an if statement.

if __name__ == '__main__':
    train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=True, download=True,
                   transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.1307,), (0.3081,))
                   ])),
    batch_size=args.batch_size, shuffle=True, **kwargs)
    test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=False, transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.1307,), (0.3081,))
                   ])),
    batch_size=args.batch_size, shuffle=True, **kwargs)