jasonrig / address-net

A package to structure Australian addresses
MIT License
195 stars 86 forks source link

updating files and model for tf2 #16

Open danfeltham opened 3 years ago

danfeltham commented 3 years ago

Summary

I updated the address-net files and model to work with Tensorflow v2.

The benefit of this is that you can now use Python v3.6-3.9 with address-net. Previously, as it was written with Tensorflow v1 in mind, support was only available for Python 3.6-3.7.

Corresponding issue: https://github.com/jasonrig/address-net/issues/11

Changes

The steps taken were to run the files through Tensorflow's automatic conversion script and then make a few manual changes. Finally, I retrained the address-net model using the same GNAF dataset (albeit for the most recent release). One minor change was to add "Place" entry into the lookups.py street abbreviations dictionary because I found it to be missing when I used the original version of address-net.

Further change will need to be made to the setup.py file to show support for the new Tensorflow version range if this pull request is accepted.

Testing

I tested it on Python v3.8 with Tensorflow v2.5 using the provided predict.py script.

MohsinTariq10 commented 3 years ago

tested it its working okay with tensorflow 2.5 and python v3.8 on aws lambda!

MohsinTariq10 commented 3 years ago

Hi @Aretle have you witnessed this issue ? https://github.com/jasonrig/address-net/issues/17

danfeltham commented 3 years ago

Hi @Aretle have you witnessed this issue ? #17

Hi, I have seen that type of output, even before updating the model for tf2. I always thought it was a prediction error. I've put a more detailed response in the issue thread.

blue2609 commented 3 years ago

Hi @Aretle,

First of all, this looks like a very cool package that can parse Australian addresses accurately (save for a few cases)

I'm facing the same issue on my MacBook running macOS Big Sur 11.5.2. I have this conda environment setup:

            shell level : 2
       user config file : /Users/admin/.condarc
 populated config files : /Users/admin/.condarc
          conda version : 4.10.3
    conda-build version : 3.21.4
         python version : 3.8.8.final.0
       virtual packages : __osx=10.16=0
                          __unix=0=0
                          __archspec=1=x86_64
       base environment : /Users/admin/opt/anaconda3  (writable)
      conda av data dir : /Users/admin/opt/anaconda3/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/osx-64
                          https://conda.anaconda.org/conda-forge/noarch
                          https://repo.anaconda.com/pkgs/main/osx-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/osx-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /Users/admin/opt/anaconda3/pkgs
                          /Users/admin/.conda/pkgs
       envs directories : /Users/admin/opt/anaconda3/envs
                          /Users/admin/.conda/envs
               platform : osx-64
             user-agent : conda/4.10.3 requests/2.25.1 CPython/3.8.8 Darwin/20.6.0 OSX/10.16
                UID:GID : 501:20
             netrc file : None
           offline mode : False

Now, I ran pip install address-net and then ran mamba install -c conda-forge tensorflow (I'm using mamba package manager to install conda packages). The problem is, it looks like address-net doesn't work with the version of tensorflow installed by mamba (tensorflow version 2.4.3).

This is very weird because as I was reading this thread, I assume this issue has already been fixed and I can see that you made changes to some Python files and adapted them to Tensorflow 2.x++

I think I'm doing something wrong here on my end, any help would be appreciated 😅

EDIT:

I've also tried running

pip install address-net 
pip install tensorflow

to install address-net version 1.0 from and tensorflow version 2.6.0 from pypi. However, I'm still getting the same error:

Traceback (most recent call last):
  File "/Users/admin/Documents/Projects/properties_and_suburbs/python_test/parsing_au_addresses/addres_net.py", line 1, in <module>
    from addressnet.predict import predict_one
  File "/Users/admin/opt/anaconda3/envs/propbyte_properties_and_suburbs/lib/python3.9/site-packages/addressnet/predict.py", line 6, in <module>
    from addressnet.dataset import predict_input_fn, labels_list
  File "/Users/admin/opt/anaconda3/envs/propbyte_properties_and_suburbs/lib/python3.9/site-packages/addressnet/dataset.py", line 14, in <module>
    ('building_name', tf.FixedLenFeature([], tf.string)),
AttributeError: module 'tensorflow' has no attribute 'FixedLenFeature'

when I ran the sample code provided in this repository

from addressnet.predict import predict_one

if __name__ == '__main__':
    # This is a fake address!
    print(predict_one(
            "casa del gelato, 10A 24-26 high street road mount waverley vic 3183"
        )
    )
danfeltham commented 3 years ago

Hi @blue2609, I think I understand your problem. As this commit is a fork for the adress-net package, trying to install address-net with pip and pypi will result in downloading the original branch that @jasonrig made which is the one stored on the Pypi index. That one that isn't updated for Tensorflow V2. This is the pull request to update that original repository.

There are a couple of options for you. If you're able to use Tensorflow version 1, you can install address-net with pip and Tensorflow <= 1.15 I believe with: pip install git+https://github.com/jasonrig/address-net.git pip install tensorflow==1.15 Though be advised that this way won't work with Python >= 3.8 I believe.

If you would like to use this version that will allow you to use later versions of Tensorflow and Pyython you could try something along the lines of: pip install git+https://github.com/Aretle/address-net.git pip install tensorflow I have not tried this and I don't know if it will work because the setup.py file still disallows TFv2. Worth giving a shot though.

I hope this sheds some light on your problem and helps you out a bit. Let me know how it goes for you and I'll try help you out if you have anymore trouble.

jasonrig commented 3 years ago

@Aretle first of all, please accept my apologies for not noticing this (badly needed) PR. I have been swamped with notifications from my day job and this never caught my attention. Let me check this over the coming week in view of getting it merged. Thank you for your hard work

blue2609 commented 3 years ago

@Aretle

Hi mate, thanks so much for the help!

Ah right I see, so the change hasn't been integrated to the main branch huh? That's alright, there's always a workaround like you said Aretle :)

Following your suggestion, I managed to get address-net to work by running these:

# create a new conda virtual environment called 'address-net'
# and then activate this virtual environment
mamba create --name address_net 
conda activate address_net

# install the latest version of Python 3.7 to this virtual environment
# the command below will install Python 3.7.10
mamba install python=3.7 --channel conda-forge

# exit the virtual environment and then activate the virtual environment
# again so the package manager can locate the proper Python version I just installed
conda deactivate; conda activate address_net

# Download address-net and buil the package
pip install git+https://github.com/jasonrig/address-net.git

# Install tensorflow version 1.15 to this virtual environment
pip install tensorflow==1.15

Now, after running the commands above, this is the list of packages installed in my conda virtual environment:

# packages in environment at /Users/admin/opt/anaconda3/envs/address_net:
#
# Name                    Version                   Build  Channel
absl-py                   0.15.0                   pypi_0    pypi
address-net               1.0                      pypi_0    pypi
astor                     0.8.1                    pypi_0    pypi
ca-certificates           2021.10.8            h033912b_0    conda-forge
cached-property           1.5.2                    pypi_0    pypi
gast                      0.2.2                    pypi_0    pypi
google-pasta              0.2.0                    pypi_0    pypi
grpcio                    1.41.0                   pypi_0    pypi
h5py                      3.5.0                    pypi_0    pypi
importlib-metadata        4.8.1                    pypi_0    pypi
keras-applications        1.0.8                    pypi_0    pypi
keras-preprocessing       1.1.2                    pypi_0    pypi
libcxx                    12.0.1               habf9029_0    conda-forge
libffi                    3.4.2                he49afe7_4    conda-forge
libzlib                   1.2.11            h9173be1_1013    conda-forge
markdown                  3.3.4                    pypi_0    pypi
ncurses                   6.2                  h2e338ed_4    conda-forge
numpy                     1.21.3                   pypi_0    pypi
openssl                   3.0.0                h0d85af4_1    conda-forge
opt-einsum                3.3.0                    pypi_0    pypi
pip                       21.3.1             pyhd8ed1ab_0    conda-forge
protobuf                  3.19.0                   pypi_0    pypi
python                    3.7.10          hf3644f1_104_cpython    conda-forge
python_abi                3.7                     2_cp37m    conda-forge
readline                  8.1                  h05e3726_0    conda-forge
setuptools                58.2.0           py37hf985489_0    conda-forge
six                       1.16.0                   pypi_0    pypi
sqlite                    3.36.0               h23a322b_2    conda-forge
tensorboard               1.15.0                   pypi_0    pypi
tensorflow                1.15.0                   pypi_0    pypi
tensorflow-estimator      1.15.1                   pypi_0    pypi
termcolor                 1.1.0                    pypi_0    pypi
textdistance              4.2.1                    pypi_0    pypi
tk                        8.6.11               h5dbffcc_1    conda-forge
typing-extensions         3.10.0.2                 pypi_0    pypi
werkzeug                  2.0.2                    pypi_0    pypi
wheel                     0.37.0             pyhd8ed1ab_1    conda-forge
wrapt                     1.13.2                   pypi_0    pypi
xz                        5.2.5                haf1e3a3_1    conda-forge
zipp                      3.6.0                    pypi_0    pypi
zlib                      1.2.11            h9173be1_1013    conda-forge

And below is my conda environment configuration

add_anaconda_token: True
add_pip_as_python_dependency: True
aggressive_update_packages:
  - ca-certificates
  - certifi
  - openssl
allow_conda_downgrades: False
allow_cycles: True
allow_non_channel_urls: False
allow_softlinks: False
always_copy: False
always_softlink: False
always_yes: None
anaconda_upload: None
auto_activate_base: True
auto_stack: 0
auto_update_conda: True
bld_path: 
changeps1: True
channel_alias: https://conda.anaconda.org
channel_priority: flexible
channels:
  - conda-forge
  - defaults
client_ssl_cert: None
client_ssl_cert_key: None
clobber: False
conda_build: {}
create_default_packages: []
croot: /Users/admin/opt/anaconda3/conda-bld
custom_channels:
  pkgs/main: https://repo.anaconda.com
  pkgs/r: https://repo.anaconda.com
  pkgs/pro: https://repo.anaconda.com
custom_multichannels:
  defaults: 
    - https://repo.anaconda.com/pkgs/main
    - https://repo.anaconda.com/pkgs/r
  local: 
debug: False
default_channels:
  - https://repo.anaconda.com/pkgs/main
  - https://repo.anaconda.com/pkgs/r
default_python: 3.8
default_threads: None
deps_modifier: not_set
dev: False
disallowed_packages: []
download_only: False
dry_run: False
enable_private_envs: False
env_prompt: ({default_env}) 
envs_dirs:
  - /Users/admin/opt/anaconda3/envs
  - /Users/admin/.conda/envs
error_upload_url: https://conda.io/conda-post/unexpected-error
execute_threads: 1
extra_safety_checks: False
force: False
force_32bit: False
force_reinstall: False
force_remove: False
ignore_pinned: False
json: False
local_repodata_ttl: 1
migrated_channel_aliases: []
migrated_custom_channels: {}
non_admin_enabled: True
notify_outdated_conda: True
offline: False
override_channels_enabled: True
path_conflict: clobber
pinned_packages: []
pip_interop_enabled: True
pkgs_dirs:
  - /Users/admin/opt/anaconda3/pkgs
  - /Users/admin/.conda/pkgs
proxy_servers: {}
quiet: False
remote_backoff_factor: 1
remote_connect_timeout_secs: 9.15
remote_max_retries: 3
remote_read_timeout_secs: 60.0
repodata_fns:
  - current_repodata.json
  - repodata.json
repodata_threads: None
report_errors: None
restore_free_channel: False
rollback_enabled: True
root_prefix: /Users/admin/opt/anaconda3
safety_checks: warn
sat_solver: pycosat
separate_format_cache: False
shortcuts: True
show_channel_urls: None
signing_metadata_url_base: None
solver_ignore_timestamps: False
ssl_verify: True
subdir: osx-64
subdirs:
  - osx-64
  - noarch
target_prefix_override: 
track_features: []
unsatisfiable_hints: True
unsatisfiable_hints_check_depth: 2
update_modifier: update_specs
use_index_cache: False
use_local: False
use_only_tar_bz2: False
verbosity: 0
verify_threads: 1
whitelist_channels: []

With this, I was able to run

from addressnet.predict import predict_one

if __name__ == '__main__':
    # This is a fake address!
    print(predict_one(
            "casa del gelato, 10A 24-26 high street road mount waverley vic 3183"
        )
    )

with no problem :)

blue2609 commented 3 years ago

@jasonrig

Mate, I just want to say that your address-net is probably the best, (YES, THE BEST) Australian address parser package I've seen so far. Training RNN model on GNAF dataset is ingenious. There are so many other Australian address parsers out there but the problem is if we try to pass something like

27 lancaster rd new sooth woles

to the address parser, it's going to output an error.

This isn't the case with your RNN based address-net address parser as it can auto-correct typos and recognise the street address and street name automatically just from that string. Absolutely fantastic 👍

danfeltham commented 3 years ago

Glad you were able to get it working! Thankyou for sharing your process. I haven't tested this pull request with Tensorflow v1 so it definitely raises a good question as to how backwards compatible this update is.

danfeltham commented 3 years ago

@Aretle first of all, please accept my apologies for not noticing this (badly needed) PR. I have been swamped with notifications from my day job and this never caught my attention. Let me check this over the coming week in view of getting it merged. Thank you for your hard work

No stress at all! I just thought I'd leave a pull request so people could see it and use it if they needed to. Echoing what blue2609 said, you've done an awesome job on this package so I'm glad to possibly be able to contribute. Let me know if you have any questions or want me to walk you through the changes.

I'm not sure about the backwards compatibility of this update, I think it will bring the lower limit version of Tensorflow up to a version I haven't checked.

jasonrig commented 3 years ago

My personal feeling is that we can increment the major version number and not try to maintain backward compatibility. TF 1.x is old and ensuring backwards compatibility, especially when there are no new features (that is, the API is the same), seems like time not well spent.

@Aretle since you retrained the model, it may cause slight changes to the output, and that maybe a bigger concern for others than TF backwards compatibility, since the output from the earlier version may not be 100% reproducible. This is not a problem in and of itself, but just another reason to include this as a major version increment alongside a release note to this effect.

(Other things on my wishlist: CI/CD for testing combinations of python and TF versions, a QC dataset to verify that any retrained model isn't getting noticeably worse 😇 ... not going to happen in this PR, but I'll note it as an issue for anyone who may want to take it up)

paulgdowling commented 7 months ago

@jasonrig As others have stated. what you have built is really cool. I have noticed when I run it, I am getting loads of warning messages about Tensorflow deprecating a number of the tensorflow functions and providing alternative Keras functions. Any chance these could be resolved? It is totally beyond my skill level as this technology is all new to me.