Closed aalavandhan closed 4 years ago
@nithinkrishna current version of GeoParser does not support ES, do you have your data in Solr by any chance?
No. I would like to add the elastic search integration. How would I go about doing this?
Hi @nithinkrishna ,
Would be great if you can add this to GeoParser such that it can support both Solr and elastic search.
Thanks
Hi @nithinkrishna - Did it worked? Let me know if you need more details
@chrismattmann @MBoustani
Sure. I'm going to look into it tomorrow. I'll let know know if I have questions. thank you.
Sent from my phone, Please ignore typos if any. On Apr 1, 2016 11:57 PM, "Madhav Sharan" notifications@github.com wrote:
- Did it worked? Let me know if you need more details
@chrismattmann https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_chrismattmann&d=CwMCaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=IWR6B1OWOYwF2ic9xJUa5g&m=nvUli4WEdp_48RFn5J0uFmPye8NuS5-fiMrD8sPfXCw&s=wSpPjbBEkMmXVQLmnNhA6OC_WMRZGiKyONJzexOax98&e= @MBoustani https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MBoustani&d=CwMCaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=IWR6B1OWOYwF2ic9xJUa5g&m=nvUli4WEdp_48RFn5J0uFmPye8NuS5-fiMrD8sPfXCw&s=IjdzhoLceHGe_ZcZ7gV18sYaTfpwr1ZB3rU9dd040xk&e=
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MBoustani_GeoParser_issues_57-23issuecomment-2D204659096&d=CwMCaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=IWR6B1OWOYwF2ic9xJUa5g&m=nvUli4WEdp_48RFn5J0uFmPye8NuS5-fiMrD8sPfXCw&s=hABFWyfsMVaG8H_TVUvO55p7Op63NUarV9mx6pZsDJ8&e=
thanks @nithinkrishna and @smadha for working on this
@smadha @MBoustani
Correct me if I'm wrong. The function query_crawled_index
in views.py
extracts data from a solar index, runs the geo-topic parser on the values to identify locations and dumps it into a local solar index.
Then the visualization reads from the local solar index and then plots the map.
This seems a little round about right? I understand that the formats of various inverted indices might be different, but we should be able to directly connect to solar/elastic search indices with location data.
Method 1: This is an example of my elastic search index. http://104.236.190.155:9200/polar/application-pdf/_search?from=0&size=1
As you see my object has a key called geo
which contains the results of running tika's geo-tropic parser. Will I be able to configure the visualizations to directly hit elastic search and fetch the locations?
Method 2:
I write a separate script to query elastic search and load the locations into the local solar index in the format which supports the visualizations. This approach seems cleaner. What is the format in which data needs to be pushed into local elastic search. I'm assuming IndexCrawledPoints
is where this happens.
Once we get this workflow working, we can wrap it with an API and hook it up with the UI.
What do you recommend?
Method 2 worked. It seems like the more non-intrusive approach. I will clean up my code and try to give a PR later this week.
What are your thoughts though? on a longer term strategy for the project.
Hi @nithinkrishna
Yep you got it correct about query_crawled_index in views.py.
Both your methods are new functionalities altogether. You are trying to bypass geoparsing step and instead assume that index already have geoparsed data. What would be best is if we can add ES queries in query_crawled_index in views.py. So it behaves in similar fashion with solr. If I take a geo example from your ES index, below document
admin2Code: "",
location: {
lat: 51.72703,
lon: 28.38867
},
name: "Eastern Europe",
countryCode: "",
admin1Code: ""
should be flattened to
51.72703 28.38867 Eastern Europe
and then we should run GeoTopicParser on top if it. Now since you have already ran GeoTopic parsing your methods make more sense but you dont need to pass geo fields. "related-publications" in your index should be a good candidate for GeoParsing.
As you said you already connected your ES to local solr I assume you must have done below 2 steps-
Sample API call - http://localhost:8000/query_crawled_index/http://localhost:8983/solr/dhs/test/user/pass
sample doc with actual geo data-
test_1 core -
{
"points": [
"[{'loc_name': 'RepublicofYemen', 'position': {'y': '47.5', 'x': '15.5'}}]"
],
"id": "original_id",
"_version_": 1530733127035519000
}
sample doc in admin core for domain test
"docs": [
{
"point_len_list": [
13
],
"idx_field_list": [
"id,title"
],
"core_names": [
"test_1"
],
"idx_size_list": [
388
],
"indexes": [
"http://localhost:8983/solr/dhs"
],
"id": "test",
"_version_": 1530733127305003000
}
]
If you want you can also write your own function which returns details of khooshe tiles in "return_points_khooshe" but that might not be integration
Thanks
Ah got it.
Now, I understand what you expect with elastic search integration. Ideally with ES integration you want the GeoParser to,
update_idx_details
on that index -> which generates the tilesIs this the workflow you expect?
Perfecto!
This is the exact workflow. All of this is done all you need to do is step 2.
We need to put a check to identify a URL as solr/ES and then write code for iterating through each of the documents in ES.
You will need to modify query_crawled_index
method. I think we might need to make query_crawled_index
more granular as I expect data iteration in ES to be different that solr. I will be happy to work on this with you. If needed we can chat more hangouts - msharan@usc.edu
This is a potential issue.
Using '/solr' in the URL to distinguish between solr and elastic search seems like a bad idea. Because ES doesn't have any prefix as such. It would be better if we make this decision based on a user input, say a radio button?
What do you think?
Regards,
Nithin Krishna
Linkedin https://in.linkedin.com/pub/nithin-krishna/69/809/3a9 | Github https://github.com/nithinkrishna | Blog http://nithinkrishna.github.io/
On Mon, Apr 4, 2016 at 9:40 PM, Madhav Sharan notifications@github.com wrote:
Perfecto!
This is the exact workflow. All of this is done all you need to do is step 2.
We need to put a check to identify a URL as solr/ES and then write code for iterating through each of the documents in ES.
You will need to modify query_crawled_index method. I think we might need to make query_crawled_index more granular as I expect data iteration to in ES be different that solr. I will be happy to work on this with you. If needed we can chat more hangouts - msharan@usc.edu
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_MBoustani_GeoParser_issues_57-23issuecomment-2D205641064&d=CwMCaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=IWR6B1OWOYwF2ic9xJUa5g&m=SzDJsP1UxXYdKIaVQPhPTJThV8cPE60IRcCj85bILxk&s=Rs__NMUkPdwJymSxDRQdwg3J8LLF2CSDY0nN5YDlJFQ&e=
That's a good point. I think radio button would be the cleanest way to do it but for now if you want you can put ES code in else part of "/solr" check.
What say @chrismattmann @MBoustani ?
@nithinkrishna and @smadha is there any smart way of understanding whether user using Solr or ES? Like any query call which is specific for Solr and ES?
@MBoustani @smadha Let's move the automated discovery of ES vs Solr to another thread
Hi @nithinkrishna - How is it going? Can we help in any way ?
@smadha Ah. I've been busy with finals. I'll have a PR ready by early next week .. We can discuss more
I have a set of elastic search documents with lat/long information.
What is the best strategy to connect to elastic search and get the UI working?