openvenues / libpostal

A C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
MIT License
4.09k stars 421 forks source link

Windows support #219

Closed brianmacy closed 7 months ago

brianmacy commented 7 years ago

I know Windows isn't supported but I need support so I'm doing a partial port myself (address parsing). Much of it is probably not suitable to be contributed back but I am curious on how much you'd be interested in and if/how you might want diffs that might make sense.

The most significant issues:

albarrentine commented 7 years ago

There's at least one case of a user getting libpostal to run on Windows here: https://github.com/BenK10/libpostal_windows, although haven't had a chance to look through the diffs yet. In https://github.com/openvenues/libpostal/issues/69#issuecomment-298377302, building with MinGW anyway, the main issues seemed to be creating a DLLEXPORT macro for some of the public API functions and some of the tests, as well as making sure to open all the data files in binary mode so the Windows EOF character 0x1a didn't confuse fread. Don't recall alloca being needed there, but maybe that's just MinGW.

To support Windows in master, I'd want a reasonably buttoned-up MSVC build that could be tested with a CI like Appveyor (so I can tell that the Windows build is working without actually needing a Windows machine to test on), and hopefully in a way that does not create too much #ifdef-ery. A good example of a C project with a minimally invasive Windows build is https://github.com/google/gumbo-parser, although that project is a bit simpler to build than libpostal. For our case we would also need a ported version of the shell script to download the models.

brianmacy commented 7 years ago

MinGW won’t work for me, has to be MSVC. I get the issue with Windows support. I’ve been doing cross-platform C++ longer than I care to admit.

I’m more interested in whether or not there is an appetite for accepting patches that make the libpostal source a bit more compatible with MSVC. Along the lines of the changes I mentioned to the code (alloca, casting, enum size, etc) and not anything really MSVC or CMake specific. Such changes might make libpostal more robust to differences on other unix platforms too.

If not, that is fine, I understand. If so, it would make it easier for me (or others) to sync up changes in the future.

Brian Macy Senzing, Chief of Product Development and Operations brian@senzing.commailto:brian@senzing.com

On Jun 7, 2017, 2:37 PM -0400, Al Barrentine notifications@github.com, wrote:

There's at least one case of a user getting libpostal to run on Windows here: https://github.com/BenK10/libpostal_windows, although haven't had a chance to look through the diffs yet. In #69 (comment)https://github.com/openvenues/libpostal/issues/69#issuecomment-298377302, building with MinGW anyway, the main issues seemed to be creating a DLLEXPORT macro for some of the public API functions and some of the tests, as well as making sure to open all the data files in binary mode so the Windows EOF character 0x1a didn't confuse fread. Don't recall alloca being needed there, but maybe that's just MinGW.

To support Windows in master, I'd want a reasonably buttoned-up MSVC build that could be tested with a CI like Appveyor (so I can tell that the Windows build is working without actually needing a Windows machine to test on), and hopefully in a way that does not create too much #ifdef-ery. A good example of a C project with a minimally invasive Windows build is https://github.com/google/gumbo-parser, although that project is a bit simpler to build than libpostal. For our case we would also need a ported version of the shell script to download the models.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/openvenues/libpostal/issues/219#issuecomment-306886626, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AXzs1LTWJtSSLcftYbp5CvGiLGekBlcSks5sBu31gaJpZM4NyyHe.

albarrentine commented 7 years ago

Sure, no objections here as long as it stays compatible with Mac/Linux and doesn't require new dependencies. Most of the barriers to Windows support should have been removed as of 1.0, so if there are a few things I need to do differently going forward to support a more cross-platform build, I'm happy to accommodate that.

brianmacy commented 7 years ago

Linux/Mac are my primary platforms so no issue there. The biggest problem is configure and libtool as barriers to Windows support.

How would you like me to deal with submissions?

On Jun 7, 2017, 16:58 -0400, Al Barrentine notifications@github.com, wrote:

Sure, no objections here as long as it stays compatible with Mac/Linux and doesn't require new dependencies. Most of the barriers to Windows support should have been removed as of 1.0, so if there are a few things I need to do differently going forward to support a more cross-platform build, I'm happy to accommodate that.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/openvenues/libpostal/issues/219#issuecomment-306922967, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AXzs1CY8f8CZZDxDumjrzyn3hygxigW3ks5sBw78gaJpZM4NyyHe.

albarrentine commented 7 years ago

Only casually familiar but what I've seen in the past is similar to what the Gumbo repo does i.e. including the .vcxproj XML files alongside the autotools build. In libpostal's case it would also be necessary to define a config.h or the equivalent compiler options for Windows as there are a few #ifdef tests for some of those values.

As far as patches, feel free to fork the repo on Github, make changes, and send a pull request.

xiamx commented 7 years ago

i.e. including the .vcxproj XML files alongside the autotools build. In libpostal's case it would also be necessary to define a config.h or the equivalent compiler options for Windows as there are a few #ifdef tests for some of those values.

I was working on something similar, will try to get a pull request if I can get it working

BenK10 commented 7 years ago

Some people have shown interest in my [libpostal for Windows project] (https://github.com/BenK10/libpostal_windows) so I've been thinking about how to make it more mature and "user friendly". I would ideally like some way to build a Windows version with any commit of libpostal, but it's not so simple due to all the DLLEXPORTs needed. The build process might need some sort of script to inject those into the code. I am going to resume work on this project.

I'm interested in learning what issues are involved in getting libpostal to build on MSVC. You can actually use a DLL built with MinGW in MSVC, which is what I did, but it might only work when the MSVC project is in release mode. And this only works when the DLL source is pure C, which thankfully libpostal is.

Another issue with a Windows build, which is common to both MinGW and MSVC, is that the shell programs that are built in the Linux version need to be ported. It's trivial to use the API to build your own bare-bones console programs that are functionally equivalent from a query/result standpoint in a few lines of code, but for the "standard experience", we will have to port the shells.

brianmacy commented 7 years ago

I’m sure that would be great. I just ported the read side of the parser code and used CMake. That approach was <2days of work and had a small diff.

And to your point, I made a simpler test client program so I didn’t have to deal with all the unix specific stuff.

Brian Macy Senzing, Chief of Product Development and Operations brian@senzing.commailto:brian@senzing.com

On Aug 2, 2017, 8:55 PM -0400, Benjamin Kusin notifications@github.com, wrote:

Some people have shown interest in my [libpostal for Windows project] (https://github.com/BenK10/libpostal_windows) so I've been thinking about how to make it more mature and "user friendly". I would ideally like some way to build a Windows version with any commit of libpostal, but it's not so simple due to all the DLLEXPORTs needed. The build process might need some sort of script to inject those into the code. I am going to resume work on this project.

I'm interested in learning what issues are involved in getting libpostal to build on MSVC. You can actually use a DLL built with MinGW in MSVC, which is what I did, but it might only work when the MSVC project is in release mode. And this only works when the DLL source is pure C, which thankfully libpostal is.

Another issue with a Windows build, which is common to both MinGW and MSVC, is that the shell programs that are built in the Linux version need to be ported. It's trivial to use the API to build your own bare-bones console programs that are functionally equivalent from a query/result standpoint in a few lines of code, but for the "standard experience", we will have to port the shells.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/openvenues/libpostal/issues/219#issuecomment-319838113, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AXzs1HQ09GDMh587rnogchBjd1Rzznx3ks5sURqHgaJpZM4NyyHe.

albarrentine commented 7 years ago

It'd be fine to have a Windows build that coexists with the *nix one in this repo, even if it's read-only and there are different build instructions.

I'd suggest forking this repo directly, and then making a pull request upstream with build instructions (so we can set up Appveyor for CI) and some guidelines for what needs to be done/avoided in the future for Windows compatibility. Will be sure to give a shoutout to anyone who works on it open-source in our README.

A C project I've often reference for inspiration is Google's gumbo-parser, which seems to have a working MSVC build without explicitly adding DLLEXPORT. This is an example of the structure I'd like to see ideally (Appveyor build would check if future changes break anything on Windows). Libpostal's build is slightly more involved than gumbo's and yes, would entail at minimum porting the libpostal_data script responsible for downloading the model files.

I'm fine with keeping everything in sync if someone with a Windows machine gets an MSVC build working and sends a pull request.

philbritton commented 7 years ago

Just FYI I've gotten this to build and run on windows using bash for windows on machines with WSL, https://msdn.microsoft.com/en-us/commandline/wsl/faq

BenK10 commented 7 years ago

@philbritton, I was going to mention WSL but it's only available on Windows 10. I have no experience with it since I'm on Windows 7.

Can you share with us whether building on WSL requires any special steps? Or are you able to just build the shared library and then compile a program using it into a ELF that you can run, all in WSL?

Might you know if it is possible for those using MinGW/MSYS2 to export symbols for the DLL being built without using declspec(dllexport)?

srj55 commented 7 years ago

I can confirm that I have this working in Win10 with WSL. No special steps needed.

Since I have a python/django environment in Windows, I had to connect to this WSL instance of libpostal using a simple REST JSON API in python:

from bottle import route, run, response
from postal.parser import parse_address
from json import dumps

@route('/parse/<name>')
def parse(name):
    p = parse_address(name)
    response.content_type = 'application/json'
    return dumps(p)
run(host='localhost', port=8080)
BenK10 commented 7 years ago

Just wanted to let everyone know that I have a new branch of my libpostal_windows project. In the new version, a Python script with an input file takes care of all the DLLEXPORT stuff.

Of course, using a Docker container or Windows 10's WSL are better options since libpostal doesn't need any modification that way.

Now that WSL is available, is there much of a reason anymore to do a native port?

SandeepNaidu commented 7 years ago

@BenK10 ,

IMO, native compilation on windows will be much faster than WSL or docker based implementation. It is useful in Windows based installations where data at large scale has to be processed.

albarrentine commented 7 years ago

Ok folks, we've got an active pull request from @AeroXuk in #272 based on @BenK10's fork which adds Windows support to mainline libpostal for MSYS2/MingGW64. Appveyor builds a zipfile with the DLL, header, .lib, etc. which, if I'm reading above correctly, should also work for MSVC users linking to libpostal (might need to build binaries for a few different architectures in that case, would like to start doing that for Mac/Linux as well at some point).

By reorganizing the tests a bit it was possible to confine the dllexports to the public header, which works for me in terms of current/future development.

Plan is to merge later today. Does that work for people/any comments before doing so?

philhutch50 commented 6 years ago

The windows build really doesn't work - whichever way I try and build it. I was hoping to get somewhere with this.. but have tried everything and now to give up.

Don't like being beaten but just cannot get this to work.

Is there a problem with the UK data set as well, as the download links are dead? Thanks Phil

philhutch50 commented 6 years ago

@brianmacy @albarrentine @BenK10 @philbritton

Sorry for tagging you all but I thought hopefully I might be able to get somewhere then ??

I have followed Brian's idea above and downloaded WSL on windows 10 and managed to get it installed and running after some fiddling about. I can actually do some interactive parsing.

@srj55 - Steve - Now it's installed and probably out of scope on here (maybe) how do then access the WSL machine and make libpostal execute ?

I cannot get the windows build to work at all. I have tried using the DLL but it looks like there are some dependencies missing and I just grind to a halt.

I am very impressed with what I can see happening in the WSL demo - so would like to move forward if possible.

With huge thanks in advance Phil

AeroXuk commented 6 years ago

From my experience building on WSL works fine and is relatively pain-free however the resulting binary produced is an Ubuntu binary, not a Windows binary, so can only be used within the WSL environment. This may be acceptable for some peoples requirements.

If you don't mind using pre-build binaries, you can grab the last succeful Compile from the AppVeyor CI.

Below are URL Shorterned links to the last successful windows build via AppVeyor for this project:

Downloads (From openvenues/libpostal AppVeyor CI):

Full length URLs are here: https://github.com/openvenues/libpostal/pull/272#issuecomment-350252085

... 32-bit: https://ci.appveyor.com/api/projects/albarrentine/libpostal/artifacts/libpostal.zip?branch=master&job=Environment%3A%20COMPILER%3Dmsys2%2C%20PLATFORM%3Dx86%2C%20MSYS2_ARCH%3Dx86%2C%20MSYS2_DIR%3Dmsys64%2C%20MSYSTEM%3DMINGW32%2C%20BIT%3D32

64-bit: https://ci.appveyor.com/api/projects/albarrentine/libpostal/artifacts/libpostal.zip?branch=master&job=Environment%3A%20COMPILER%3Dmsys2%2C%20PLATFORM%3Dx64%2C%20MSYS2_ARCH%3Dx86_64%2C%20MSYS2_DIR%3Dmsys64%2C%20MSYSTEM%3DMINGW64%2C%20BIT%3D64

philhutch50 commented 6 years ago

@AeroXuk - Thanks - these are the ones I think I downloaded before but could not get it to work. I have found on here a .NET assembly and tried these with it - but they failed and tested the DLL with dependency walker and gave lot of errors on the 32 bit DLL?

Have you used these yourself with any success?

Thanks Phil

philhutch50 commented 6 years ago

@AeroXuk Thanks for the help, I have abandoned getting the native windows version to work, as I get the impression that everyone has struggled. I have downloaded and ran the WSL and got this to run and believe coding in python to access my data this route will work.

Many thanks Phil

philhutch50 commented 6 years ago

@AeroXuk @albarrentine @BenK10 @philbritton @brianmacy

I managed to get the Windows builds to work... I don't know if any one else is struggling but the correct way to build the 64 bit version that will work is in these instructions..

https://github.com/openvenues/pypostal/pull/39/commits/a234e69a47b5edf7e1f0579637ae5c2db33feaa0

It's not my work and was written by someone compiling the DLL for windows python, and I am not running the finally exports (lib.exe) but when I compiled the DLL but didn't do the exports at the end the corresponding libpostal-1.dll was bigger than the 64 bit libpostal dll I had downloaded before. And when you rename this to libpostal.dll and place it in the x64 directory of the assembly (if using .NET) - it actually works!

Hope this helps some one else who is stuck! Cheers Phil

brianmacy commented 6 years ago

I believe that is a MINGW build and not MSVC.

Brian Macy Senzing, Director of Product Development and Operations brian@senzing.commailto:brian@senzing.com On Oct 24, 2018, 06:05 -0400, Phil Hutchinson notifications@github.com, wrote:

@AeroXukhttps://github.com/AeroXuk @albarrentinehttps://github.com/albarrentine @BenK10https://github.com/BenK10 @philbrittonhttps://github.com/philbritton @brianmacyhttps://github.com/brianmacy

I managed to get the Windows builds to work... I don't know if any one else is struggling but the correct way to build the 64 bit version that will work is in these instructions..

openvenues/pypostal@a234e69https://github.com/openvenues/pypostal/commit/a234e69a47b5edf7e1f0579637ae5c2db33feaa0

It's not my work and was written by someone compiling the DLL for windows python, and I am not running the finally exports (lib.exe) but when I compiled the DLL but didn't do the exports at the end the corresponding libpostal-1.dll was bigger than the 64 bit libpostal dll I had downloaded before. And when you rename this to libpostal.dll and place it in the x64 directory of the assembly (if using .NET) - it actually works!

Hope this helps some one else who is stuck! Cheers Phil

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/openvenues/libpostal/issues/219#issuecomment-432596057, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AXzs1E7osKTeJjpleh98pb5RCZjB-ohiks5uoDuBgaJpZM4NyyHe.

philhutch50 commented 6 years ago

@brianmacy The compiled DLL (Mingw) works properly with .NET routines created on here by others. Which should allow other access as well as there is already the python library that will now work with this one as well. As the appveyor builds do not work at all with .NET routines.

brianmacy commented 6 years ago

Yep. But I believe it requires the MINGW libraries too.

Brian Macy Senzing, Director of Product Development and Operations brian@senzing.commailto:brian@senzing.com On Oct 24, 2018, 8:36 AM -0400, Phil Hutchinson notifications@github.com, wrote:

@brianmacyhttps://github.com/brianmacy The compiled DLL (Mingw) works properly with .NET routines created on here by others. Which should allow other access as well as there is already the python library that will now work with this one as well. As the appveyor builds do not work at all with .NET routines.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/openvenues/libpostal/issues/219#issuecomment-432637348, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AXzs1Df4grajCisfvYsg0N44Ueipz5ddks5uoF7dgaJpZM4NyyHe.

AeroXuk commented 6 years ago

Yep. But I believe it requires the MINGW libraries too.

The x64 .dll compiled by AppVeyor or self compiled using the MSYS2 MinGW 64-bit command shell. Only links to Kernel32.dll & Msvcrt.dll.

The x84 .dll compiled by AppVeyor or self compiled using the MSYS2 MinGW 32-bit command shell, seems to also reference libGCC_S_DW2-1.dll in addition to the two referenced by the x64 build. This is probably a bug.

@brianmacy The compiled DLL (Mingw) works properly with .NET routines created on here by others. Which should allow other access as well as there is already the python library that will now work with this one as well. As the appveyor builds do not work at all with .NET routines.

I can confirm I use the two .dlls produced by AppVeyor, along with the .Net wrapper in a production enviroment that doesn't have MingGW or Msys2 installed on it, however it is a x64 system so isn't using the x86 version (which appears to be compiled with a reference to gcc).

Edit: Looks like -static-libgcc compile flag is needed so that only a single .dll is needed for libpostal on x86.

philhutch50 commented 6 years ago

@AeroXuk Thats really interesting. I could not get .NET to work at all with the AppVeyor build - but by copy the X64 build in it worked. I was trying to get .NET to build as a component so I could reference it in a Delphi - but given up for now.

selva221724 commented 3 years ago

Hi All,

If you are trying to build libpostal in windows and facing many issues, pypostalwin is a package for you, feel free to open the issue if there is any

https://github.com/selva221724/pypostalwin

Please support the package to grow more and give start to this repo

If you don't want to use the python binding, download the 0rebuilt bundle that you can use from the pypostalwin repo https://github.com/selva221724/pypostalwin#installation