Closed untoldone closed 10 years ago
@untoldone +1 on this.
I've started reading the code and I've found it quite complicated.
I didn't know that it was a 300+ columns table. I bet that makes it complicated to write an API on top of that dataset.
I think it's a good idea to separate the huge table into separate tables.
For example: We could turn "Provider Primary Taxonomy Switch_provider_primary_taxonomy_switches
table, or "Other Provider Identifier_15","Other Provider Identifier Type Code_15","Other Provider Identifier State_15","Other Provider Identifier Issuer_15" into a other_provider_identifiers
table with columns id, type_code, state, issuer.
I believe this normalization will make it easier to add new features and correct bugs.
Just a fair warning, I'm quite a n00b with NPI/NPPES data.
Agreed. I'm going to take a look at potentially getting started on this this weekend and getting a feel for how much work is involved.
@untoldone Have you made any progress on this? Would love to see a fork and collaborate on normalization. I'm not using Bloomapi specifically, but have been observing the project with interest as I'm working on a Python application that uses the NPI dataset.
@kevinseelbach Yes -- I just got started on this rewrite last week or so. The changes are likely to go beyond just normalization of the NPI and turn into a relatively simple to use ETL tool + API + Search tool for any dataset that are still relatively small. The project is being optimized for time it takes to add for someone to describe the dataset.
Checkout the vnext branch (a work in progress) as I'm currently working there. Would love to chat more in depth if you're interested.
This rewrite has now been completed and is present in v0.2.0. Please let me know if you have any feedback or questions @kevinseelbach @etagwerker.
BloomAPI's code is kind of a mess today. It works, but will be very difficult to improve upon in the future given some pretty messy code despite its small code base.
Some people want to use BloomAPI only for it's db syncing without the API, but the db looks exactly like the NPI files (which is 300+ columns wide and not super dev friendly).
In addition, some people are starting to depend on it. Next step features such as Geo-coding, taxonomy inclusion, including hospital/ physician compare datasets will all be painful to do with the current code.
I would like to rewrite the import + query code in the following ways:
Please provide feedback if you have any!