nasa-jpl-memex / GeoParser

Extract and Visualize location from any file
Apache License 2.0
52 stars 23 forks source link
cord19 covid-19 django docker extract gazetteer geoparser geospatial-analysis geospatial-analytics geospatial-data geospatial-processing solr tika tika-server visualize-locations

GeoParser

The Geoparser is a software tool that can process information from any type of file, extract geographic coordinates, and visualize locations on a map. Users who are interested in seeing a geographical representation of information or data can choose to search for locations using the Geoparser, through a search index or by uploading files from their computer. The Geoparser will parse the files and visualizes cities or latitude-longitude points on the map. After the information is parsed and points are plotted on the map, users are able to filter their results by density, or by searching a key word and applying a "facet" to the parsed information. On the map, users can click on location points to reveal more information about the location and how it is related to their search.

Installation (Docker)

  1. docker build -t nasajplmemex/geo-parser --no-cache -f Dockerfile .
  2. docker-compose up -d
  3. Visit http://localhost:8000 on your browser

Try it out to help fight COVID!

GeoParser has been updated with a new easy to use Docker install, and also an example to download and run the COVID-19 literature data and view the locations. Use that example to explore and test out GeoParser on a real example and view locations from that dataset.

Installation (manually)

Requirements

  1. Python 2.7
  2. pip
  3. Django
  4. Tika Python

Install Requirements

  1. Install python requirements
    pip install -r requirements.txt

How to Run the Application

  1. Run Solr Change directory to where you cloned the project

    cd Solr/solr-5.3.1/
    ./bin/solr start
  2. Clone lucene-geo-gazetteer repo

      git clone https://github.com/chrismattmann/lucene-geo-gazetteer.git
      cd lucene-geo-gazetteer
      mvn install assembly:assembly
      add lucene-geo-gazetteer/src/main/bin to your PATH environment variable

    make sure it is working

      lucene-geo-gazetteer --help
      usage: lucene-geo-gazetteer
       -b,--build <gazetteer file>           The Path to the Geonames
                                             allCountries.txt
       -h,--help                             Print this message.
       -i,--index <directoryPath>            The path to the Lucene index
                                             directory to either create or read
       -s,--search <set of location names>   Location names to search the
                                             Gazetteer for
  3. You will now need to build a Gazetteer using the Geonames.org dataset. (1.2 GB)

      cd lucene-geo-gazetteer
      curl -O http://download.geonames.org/export/dump/allCountries.zip
      unzip allCountries.zip
      lucene-geo-gazetteer -i geoIndex -b allCountries.txt

    make sure it is working

      lucene-geo-gazetteer -s Pasadena Texas
      [
      {"Texas" : [
      "Texas",
      "-91.92139",
      "18.05333"
      ]},
      {"Pasadena" : [
      "Pasadena",
      "-74.06446",
      "4.6964"
      ]}
      ]

    Now start lucene-geo-gazetteer server

    lucene-geo-gazetteer -server
  4. Run tika server as mentioned in https://cwiki.apache.org/confluence/display/TIKA/GeoTopicParser on port 8001. Port can be configured via config.txt

  5. Make sure you can extract locations from Tika Server

    curl -T /path/to/polar.geot -H "Content-Disposition: attachment; filename=polar.geot" http://localhost:8001/rmeta

    You can obtain [file here] (https://raw.githubusercontent.com/chrismattmann/geotopicparser-utils/master/geotopics/polar.geot)

    Output should be this

    [
    {
      "Content-Type":"application/geotopic",
      "Geographic_LATITUDE":"39.76",
      "Geographic_LONGITUDE":"-98.5",
      "Geographic_NAME":"United States",
      "Optional_LATITUDE1":"27.33931",
      "Optional_LONGITUDE1":"-108.60288",
      "Optional_NAME1":"China",
      "X-Parsed-By":[
         "org.apache.tika.parser.DefaultParser",
         "org.apache.tika.parser.geo.topic.GeoParser"
      ],
      "X-TIKA:parse_time_millis":"1634",
      "resourceName":"polar.geot"
    }
    ]
    1. Run Django server python manage.py runserver

    2. Open in browser http://localhost:8000/ Note : Please refer to the wiki page on this github repository which can act as a guide for you on how to use GeoParser.

Technologies we Use