untoldone / bloomapi

Create APIs out of public datasources
https://www.bloomapi.com/documentation/public-data
MIT License
89 stars 29 forks source link

Current versions of bloomapi and bloomnpi not compatible #77

Closed jdavisp3 closed 7 years ago

jdavisp3 commented 7 years ago

I've been trying to get a local instance of bloomapi running by following the instructions here. But the elasticsearch index created by bloomnpi seems to be different than that expected by bloomapi. I think I've gotten it to work by applying this patch:

diff --git a/handler/search_helper.go b/handler/search_helper.go
index 17a847d..fbb91d2 100644
--- a/handler/search_helper.go
+++ b/handler/search_helper.go
@@ -150,7 +150,7 @@ func Search(sourceType string, params *SearchParams, r *http.Request) (map[strin
        api.AddMessage(r, experimentalSort)
    }

-   result, err := conn.Search(sourceType, "main", nil, query)
+   result, err := conn.Search("source", sourceType, nil, query)
    if err != nil {
        switch terr := err.(type) {
        case elastigo.ESError:

The change to bloomapi seems to have happened after the current bloomnpi. Is there a missing bloomnpi version?

untoldone commented 7 years ago

I'd actually suggest making the reverse change in bloomnpi as that's the way it works in our hosted / production version. Let me take a closer look at the whole project today to make sure everything's still working ok.

Just curious btw, what are you going to be using Bloom with?

jdavisp3 commented 7 years ago

Thanks, I'll give that a try! We (Counsyl) have been using an older version internally for some time, just looking to bring things up to date.

untoldone commented 7 years ago

Looks like it works OK to import data with this change -- I'll check , but the datasource itself doesn't update the /api/sources list ... this means you'll have to search with /api/search and find specific listings with /api/npis/:npi for now over using /api/search/usgov.hhs.npi and /api/sources/usgov.hhs.npi/:npi. I pushed this update just now with https://github.com/gocodo/bloomnpi/commit/e743a5d78b3f964822df872b718e03a1eb9ddf4b.

As an aside, I decided to open source our production versions of this code over the weekend which includes many other datasources in addition to the NPI. This code does maintain the sources list correctly. I'm going to be putting a little more time into this this week to write some better documentation for it. You can find this at https://github.com/bloomapi/datasources.

jdavisp3 commented 7 years ago

Thanks very much!

untoldone commented 7 years ago

Just pushed and tested some versions that make getting a running copy with an updated copy of the NPI and other datasources via docker significantly easier. E.g.

  1. Download https://raw.githubusercontent.com/untoldone/bloomapi/master/docker-compose.yml
  2. Run docker-compose up -d on a machine with docker (where Docker gets at least 4GB of memory)

I'm going to go ahead and close this issue, but let me know if this is enough for you to get going with or if you have any other questions.

jdavisp3 commented 7 years ago

Thank you very much!!

jdavisp3 commented 7 years ago

Regarding that 4GB for Docker -- how should that be divided up amongst the different containers? I assume the elasticsearch container needs the most?

untoldone commented 7 years ago

In a prod environment, the more memory you feed ES the better (at least 2GB) -- PG needs the second most (at least 512MB-1G but would likely want more). The other containers should be relatively lean. Just as a warning, while it works with 4GB, I haven't tested it rigorously at this point and its possible 4GB isnt enough for a prod environment. For context, we used to run ES on a cluster of 3 machines each with 4GB of memory reserved for ES (12GB total). We recently moved to a single host with 16GB. If you are in a cloud environment, more memory for the DB will also dramatically increase the load times for the datasets as PG can put everything into memory/ cache rather than having to load everything from disk which can be super slow to load otherwise (we're talking 15 minutes vs 6+ hours).

jdavisp3 commented 7 years ago

That makes sense, thanks very much!