nasa-jpl-memex / GeoParser

Extract and Visualize location from any file
Apache License 2.0
52 stars 23 forks source link

Need to modify solr schema #48

Closed smadha closed 8 years ago

smadha commented 8 years ago

As we are indexing millions of records I can see lot of issues with solr.

We initially did lot of handholding editing data types and using encoding but it's now constantly failing with OutOfMemoryError. I have tried increasing mem upto 1.5 gigs but it only gives us some extra time.

At this moment I am trying to create a new schema which addresses below limitations

I am planning below -

I am right now doing it only for indexed data on a new branch. As uploaded files thing works with no issues.

@MBoustani

smadha commented 8 years ago

New Branch - "new_solr_schema"

smadha commented 8 years ago

I have created a new admin core which will map input index to local core. Below is a example

ADMIN schema. This holds mapping information between input index and local solr cores
- id - domain-name
-   indexes -  list of indexes for this domain.
-   core_names - list of cores for those indexes.
-   point_list - list of num of points points found for each index
-   idx_size_list - list of size of target index for each index

{
        "core_names": [
          "test_1",
          "test_2"
        ],
        "point_len_list": [
          6,
          12883
        ],
        "id": "test",
        "idx_size_list": [
          388,
          7208416
        ],
        "indexes": [
          "http://host/solr/core1",
          "http://host/solr/core2"
        ]
}

@MBoustani @chrismattmann Do you guys agree?

smadha commented 8 years ago

Fixed in https://github.com/MBoustani/GeoParser/pull/49 Branch new_solr_schema merged to master