somnathrakshit / geograpy3

Extract place names from a URL or text, and add context to those names -- for example distinguishing between a country, region or city.
https://geograpy3.readthedocs.io
Apache License 2.0
122 stars 12 forks source link

Not able to fetch places when passing url, but text works #22

Closed monkins closed 3 years ago

monkins commented 3 years ago

Hey there,

Thanks so much for the awesome module! I installed it on python3. Overall worked great so far. However I am not able to fetch the places info when I pass a url. But when I pass a text, it does an excellent job.

WITH URL:

import geograpy url='https://www.bbc.com/news/uk-13426353' places = geograpy.get_geoPlace_context(url = url) print(places) countries=[] regions=[] cities=[] other=[]

WITH TEXT:

import geograpy text='During the 70-day torch relay, it will pass through towns and cities including Bristol, Cardiff, Liverpool, Belfast, Glasgow, Aberdeen, Newcastle, Manchester, Sheffield, Nottingham, Oxford, Southampton and Dover.' places = geograpy.get_geoPlace_context(text=text) print(places) countries=['South Africa', 'Australia', 'New Zealand', 'United Kingdom', 'Ireland', 'United States', 'Canada'] regions=[] cities=['Newcastle', 'Belfast', 'Sheffield', 'Cardiff', 'Oxford', 'Southampton', 'Nottingham', 'Bristol'] other=[]

Not sure if I missed out something really silly.

Thanks for your help in advance.

Monkins

WolfgangFahl commented 3 years ago

Indeed if we give an example it should work. Unfortunately https://www.bbc.com/news/uk-13426353 will only display the relay route in a browser but not when fetched via API. Try

curl https://www.bbc.com/news/uk-13426353

Which will give you a lot of javascript gibberish but not the torch relay route. I've modified the example to: https://en.wikipedia.org/wiki/2012_Summer_Olympics_torch_relay

And the result there is:

countries=['Ireland', 'Guernsey', 'Jersey', 'United Kingdom', 'Turkey', 'Greece', 'Belarus', 'South Africa', 'Australia', 'New Zealand', 'Germany', 'France', 'Jamaica', 'Antigua and Barbuda', 'Montserrat', 'United States', 'Canada', 'Japan']
regions=['Taunton Lancashire', 'Torch', 'Cumbrian', 'Wiltshire', 'Ireland', 'United Kingdom', 'Host', 'Swansea', 'Cornwall', 'Heathrow', 'Hackney', 'Cambridge', 'Bristol Harbour', 'Derry', 'Wales', 'British/Irish', 'Maidstone', 'Engineering', 'Munich', 'Abraham', 'Guernsey', 'Jersey', 'Northern Ireland', 'Hyde Park', 'Ioannina', 'Stamford', 'Burscough', 'Bangor', 'Dublin', 'Sheffield', 'Essex', 'Athens', 'Portland', 'Aberaeron', 'British', 'Caerphilly', 'Thirsk', 'Greece', 'Locog', 'Davy', 'England', 'Plymouth', 'Lincolnshire', 'Scotland', 'Newton Aycliffe', 'Land', 'Hera', 'East', 'Falmouth', 'Turkey', 'London', 'Gravesend', 'Cardiff', 'Wanted']
cities=['Ioannina', 'Athens', 'Dublin', 'Hyde Park', 'Swansea', 'Sheffield', 'Portland', 'Cardiff', 'Cambridge', 'Bangor', 'Thirsk', 'Stamford', 'Plymouth', 'Newton Aycliffe', 'Maidstone', 'London', 'Hackney', 'Gravesend', 'Falmouth', 'Caerphilly', 'Burscough', 'Aberaeron', 'Munich', 'Derry', 'England', 'Scotland', 'Essex', 'Turkey', 'Davy', 'Ireland', 'Lincolnshire', 'Wales', 'Guernsey', 'Cornwall', 'Heathrow', 'Hera']
other=['British', 'British', 'East']
WolfgangFahl commented 3 years ago

we'd appreciate if you star this project ..

WolfgangFahl commented 3 years ago

Same problem for https://www.bbc.com/news/world-europe-26919928

monkins commented 3 years ago

Doh!!! Thanks so much!!! Really appreciate it.

My bad should have tried few more urls before bothering you.

Best,

On Sat, Oct 10, 2020 at 4:49 PM Wolfgang Fahl notifications@github.com wrote:

Indeed if we give an example it should work. Unfortunately https://www.bbc.com/news/uk-13426353 will only display the relay route in a browser but not when fetched via API. Try

curl https://www.bbc.com/news/uk-13426353

Which will give you a lot of javascript gibberish but not the torch relay route. I've modified the example to: https://en.wikipedia.org/wiki/2012_Summer_Olympics_torch_relay

And the result there is:

countries=['Ireland', 'Guernsey', 'Jersey', 'United Kingdom', 'Turkey', 'Greece', 'Belarus', 'South Africa', 'Australia', 'New Zealand', 'Germany', 'France', 'Jamaica', 'Antigua and Barbuda', 'Montserrat', 'United States', 'Canada', 'Japan'] regions=['Taunton Lancashire', 'Torch', 'Cumbrian', 'Wiltshire', 'Ireland', 'United Kingdom', 'Host', 'Swansea', 'Cornwall', 'Heathrow', 'Hackney', 'Cambridge', 'Bristol Harbour', 'Derry', 'Wales', 'British/Irish', 'Maidstone', 'Engineering', 'Munich', 'Abraham', 'Guernsey', 'Jersey', 'Northern Ireland', 'Hyde Park', 'Ioannina', 'Stamford', 'Burscough', 'Bangor', 'Dublin', 'Sheffield', 'Essex', 'Athens', 'Portland', 'Aberaeron', 'British', 'Caerphilly', 'Thirsk', 'Greece', 'Locog', 'Davy', 'England', 'Plymouth', 'Lincolnshire', 'Scotland', 'Newton Aycliffe', 'Land', 'Hera', 'East', 'Falmouth', 'Turkey', 'London', 'Gravesend', 'Cardiff', 'Wanted'] cities=['Ioannina', 'Athens', 'Dublin', 'Hyde Park', 'Swansea', 'Sheffield', 'Portland', 'Cardiff', 'Cambridge', 'Bangor', 'Thirsk', 'Stamford', 'Plymouth', 'Newton Aycliffe', 'Maidstone', 'London', 'Hackney', 'Gravesend', 'Falmouth', 'Caerphilly', 'Burscough', 'Aberaeron', 'Munich', 'Derry', 'England', 'Scotland', 'Essex', 'Turkey', 'Davy', 'Ireland', 'Lincolnshire', 'Wales', 'Guernsey', 'Cornwall', 'Heathrow', 'Hera'] other=['British', 'British', 'East']

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/somnathrakshit/geograpy3/issues/22#issuecomment-706532885, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARKJAYY3GZONYZ24JKYTNJDSKA7MTANCNFSM4SK7XHAA .

--

monkins commented 3 years ago

More than happy to do it! Done!

Best,

On Sat, Oct 10, 2020 at 4:54 PM Wolfgang Fahl notifications@github.com wrote:

we'd appreciate if you star this project ..

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/somnathrakshit/geograpy3/issues/22#issuecomment-706533465, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARKJAY2EEHUFKZJIB7Y5M23SKA75VANCNFSM4SK7XHAA .

--

WolfgangFahl commented 3 years ago

@monkins Thanks for the positive feedback!