scan-bugs-org / scan

The Symbiota2 project is an open source software project, with central goal of developing on-line tools that aid in the generation, exploration and management of biodiversity data (collection specimens, observations, images, checklist, keys, etc.). See also: http://bdj.pensoft.net/articles.php?id=1114 and http://symbiota.org/
GNU General Public License v2.0
1 stars 1 forks source link

Elasticsearch/Kibana #61

Open neilcobb opened 2 years ago

neilcobb commented 2 years ago

Hi Curtis,

Thank you for the follow up and it was great to meet you the other day. I was able to reach out to a search specialist and these are the thoughts he provided on your questions:

1) Keep database dataset & Elasticsearch dataset in sync: If you cannot leverage any of our native data retrievers (i.e. Elastic Agent, Beats) you will need to own a somewhat custom process. It looks like you have leveraged Logstash to lean on pipelines to do some of this data transformation during ingestion. But unless the native data source sends an alert of some sort that a document has been deleted, it will be difficult to know how to properly sync that information within Elasticsearch.

My gut tells me if you can get a unique identifier that is shared between the native data source & Elasticsearch then bulk delete requests can be sent to Elasticsearch via a Logstash pipeline. For example, if 2 MariaDB documents were deleted (each with mdbID 01 & 02), 4 MySQL DB documents deleted (each with mysqlID 01 - 03), and 200 SQL DB documents deleted (each with sqlID 01 - 200), I believe you can use a pipeline that effectively sends a bulk delete command to Elasticsearch for any instance of mdbID 01 & 02, mysqlID 01 - 03, and sqlID 01 - 200. There might be some period of time between the delete request and the documents actually reaped out of the Elasticsearch index.

2) Leverage Kibana filesystem directories/files but not datastores: I'm not 100% sure if this is doable. This is probably a good question for our tech support. But I suspect they can point at the directories/files and not the datastores. BUT, the directories/files will assuredly send communications to the datastores each time the visualization/dashboard page in Kibana is refreshed. If those datastores communications are broken or halted by your custom code, your iFrames of Kibana may show error tiles when it's not really an error on the Elasticsearch end.

I know there's an external/internet facing way of embedding Kibana in an iFrame. You may want to explore that versus creating your own wheel.

https://www.elastic.co/guide/en/kibana/current/reporting-getting-started.html#embed-code

I hope this information helps. Let me know if you have any other questions.


SHANE SCHOLES Sr. Solutions Architect Public Sector - State, Local, and Education E: shane.scholes@elastic.co M: 801-201-7409 elastic-logo-H-full color.jpg

On Tue, Feb 1, 2022 at 10:57 AM Curtis Dyreson [curtis.dyreson@usu.edu](mailto:curtis.dyreson@usu.edu) wrote: Hi Shane,

Nice to meet you this morning. I was just following up on the questions I asked.

For the iframes problem I fixed it, once you all assured me it should work I figured it was something at my end. I found the problem described here. https://discuss.elastic.co/t/kibana-filter-issues-on-embedded-dashboards/268898/5[](https://drive.google.com/u/0/settings/storage?hl=en&utm_medium=web&utm_source=gmail&utm_campaign=storage_meter&utm_content=storage_normal)[](https://www.google.com/intl/en/policies/terms/)[](https://www.google.com/intl/en/policies/privacy/)[](https://www.google.com/gmail/about/policy/) Sure enough I was using kibana 7.12.0 so I upgraded to 7.17.0.

The second question was more of a "best practices" question. Our system has a MariaDB database and a REST interface. I know I can set up a logstash pipeline to build an incrementally refreshed index from the MariaDB database. But, really we would like all of the data interactions to go through a REST interface since we have a system whereby one can use any kind of backend database (e.g., MongoDB if you want, the choice is site-specific). I can set up logstash pipelines to read data from the REST interface into an index, and using timestamps to get records that were added since the last read. But what about records that were deleted, what is best practice here, to periodically recreate the index (expensive if lots of records) or to set up a web service to push deletes/updates/inserts to the elasticsearch index (which can be complicated if they stop the elasticsearch server for a period of time)?

The third question is also a best practices question. We are distributing Symbiota2 through docker and have added the elk stack to the docker-compose. We'd like to create some Kibana dashboards that we include in our UI through iframes. The question is what directories should I bind in the local file system to capture and store the dashboards and index patterns but not the actual indexes (the data at each site will be different)?

If these questions aren't clear just let me know and I'm happy to provide more details. Thanks for any help you can provide.

cheers, curtis

Page 5 of 10